Welcome to the website of ProBound, a software for inferring energetic models of sequence recognition based on high-throughput sequencing data. Details about the methods can be found in the
bioRxiv preprint.
This website describes how to
Run ProBound
The easiest way to run ProBound is to simply submit a job to our dedicated compute server. Please complete all the steps below to run your job. A sofware manual can be found
here. The full underlying source code is available on
GitHub.
Note that this tool is designed for academic, non-commercial use only. We hope you enjoy this service.
For users who need direct access to the ProBound software, please register with
Columbia Tech Ventures. After agreeing to a material transfer agreement (MTA), access to the software will be granted for non-comercial academic use.
Apply models using ProBoundTools
Given a ProBound fit file
fit.final.json
generated by the compute server (see
above), our software package ProBoundTools can be used to predict the relative affinities of new sequences. This utility was developed in Java (downloaded
here) and is verified for version 1.8.0_91. A tarball containing the JAR file with all dependencies can be downloaded
here (the underlying source code is also accessable on
GitHub).
After downloading the tarball, ProBoundTools is installed by executing:
tar -xvf ProBoundTools.tar.gz
export PROBOUND_DIR="/path/to/ProBoundTools"
alias proBoundTools='java -cp $PROBOUND_DIR/ProBoundTools/target/ProBound-jar-with-dependencies.jar proBoundTools/App'
Here
/path/to/ProBoundTools
should be replaced with the directory created by
tar
.
Given the sequences of interest
AAAAGACGACTGCGGTCACTGAGGTGTAAA
ACTGTTTGCTCTATGCGGAGGAGCCCCTTA
TTAACTGGGTATAGGGGCGAATATGGCGAC
TTAGCCGGGAGGGGGCGCTCCGTAGTGGAT
ATAGTAGTCGTGCGCCCCCACTGGTGACAA
TGTTCCTTGCTTTTATAAGGTAAATGCAGG
(stored in the file
seq.txt
), the total relative affinity for each sequence is computed using the terminal command
proBoundTools -c 'loadFitLine(fit.final.json).buildConsensusModel().addNScoring().selectBindingMode(1).inputTXT(seq.txt).bindingModeScores(/dev/stdout)'
giving
AAAAGACGACTGCGGTCACTGAGGTGTAAA 3.06461e-05
ACTGTTTGCTCTATGCGGAGGAGCCCCTTA 1.00708e-04
TTAACTGGGTATAGGGGCGAATATGGCGAC 5.30426e-05
TTAGCCGGGAGGGGGCGCTCCGTAGTGGAT 7.04016e-02
ATAGTAGTCGTGCGCCCCCACTGGTGACAA 1.58707e-03
TGTTCCTTGCTTTTATAAGGTAAATGCAGG 2.71934e-06
Here
loadFitLine(fit.final.json)
: Loads the ProBound fit.
buildConsensusModel()
: Reconciles experiment- and round-specifc activities and adjusts the position-specific affinity matrix (PSAM) so that the maximum relative affinity for each binding offset is 1.0 (use writeModel(/dev/stdout)
to inspect the updated model).
addNScoring()
: Extends the alphabet to support the wildcard character 'N'.
selectBindingMode(1)
: Selects the second (zero indexed) binding mode (PSAM).
inputTXT(seq.txt)
: Specifies where the input sequences are located. All sequences must be of equal length.
bindingModeScores(/dev/stdout)
: Computes the relative affinity for each offset/strand and reports the sum to standard output. The affinities for each offset/strand can instead be reported using bindingModeScores(/dev/stdout,profile)
.
Additional functionality can be listed using
java -cp ProBoundTools.jar proBoundTools/App -help
MotifCentral
A curated set of ProBound derived transcription factor binding models can be accessed at
MotifCentral.org. Each model has a unique ID numer and can be loaded using the ProBoundTools command
loadMotifCentralModel(modelID)
(instead of
loadFitLine(fit.final.json)
).