Welcome to the website of ProBound, a software for inferring energetic models of sequence recognition based on high-throughput sequencing data. Details about the methods can be found in the bioRxiv preprint.

This website describes how to

Run ProBound

The easiest way to run ProBound is to simply submit a job to our dedicated compute server. Please complete all the steps below to run your job. A sofware manual can be found here.

Note that this tool is designed for academic, non-commercial use only. We hope you enjoy this service.

Step 1: Please tell us more about yourself
 
Step 2: Please upload a ProBound configuration file
Directions: Create and upload a ProBound configuration file in the JSON format. For instructions on creating a configuration file, please see this manual. For example configuration files that correspond to analyses from the paper, see the table below.

For the simplest use case - that of fitting a model containing a single sequence-specific binding mode to a single SELEX dataset - the following configuration file can be used:
[   
    {"function": "optimizerSetting", "lambdaL2": 1e-6, "pseudocount": 20, "likelihoodThreshold": 0.0002 },
    {"function": "addTable", "leftFlank":"ACACTCTTTCCCTACACGACGCTCTTCCGATCTTGACGTC", "rightFlank":"GACGTCAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG",    
        "variableRegionLength":30, "nColumns":2, "countTableFile":"countTable.0.CTCF_r3.tsv.gz", "inputFileType":"tsv.gz" },
    {"function": "addSELEX" },
    {"function": "addNS" },
    {"function": "addBindingMode", "size": 12, "flankLength": 5},
    {"function": "bindingModeConstraints", "index": 1, "maxFlankLength": -1, "maxSize": 18,
    "fittingStages": [ 
        { "optimizeFlankLength": true         },
        { "optimizeMotifShiftHeuristic": true },  
        { "optimizeSizeHeuristic": true       } ] }
]
        
Here:
  • "leftFlank":"ACACTCTTTCCCTACACGACGCTCTTCCGATCTTGACGTC" and "rightFlank":"GACGTCAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" should be edited to specify the fixed sequences to the left and right of the variable region.
  • "variableRegionLength":30 should be changed to specify the length of the variable region.
  • "nColumns":2 should specify the number of columns.
  • An example count table can be found here.
Common problems:
  • Editing JSON objects manually is challenging and syntax errors are easily introduced. Online JSON validators such as this are useful for identifying such errors.
  • Make sure that the distance specified by flankLength does not exceed the length of the sequences specified in leftFlank and leftFlank.
Select file:
 
Step 3: Please upload your data Directions: Upload all count tables in .tar.gz format. For examples of count tables, please see the table below.
Note: Once submitted, all datasets will be sorted using natural order by filename, with the first sorted file corresponding to count table 0, so on and so forth. For example, files countTableB.tsv.gz, countTable1.tsv.gz, and countTableA.tsv.gz will be assigned to count tables 2, 0 and 1, respectively.

Common problems:
  • By default, ProBound uses the alphabet ACGT and the letter N is therefore invalid.
  • The server currently supports count tables with at most 5,000,000 lines
Select file(s):
 

 

Example Datasets

DescriptionConfigurationCount tablesOutput
Analysis that produces a CTCF recognition model from SMiLE-seq data from Isakova et al. (2017). builder.singleTF.json countTable.0.CTCF_r3.tsv.gz View
Joint analysis that produces a single CTCF consensus model from multiple CTCF SELEX datasets. builder.multiTF.json countTable.0.CTCF_r3.tsv.gz
countTable.1.CTCF_ESAJ_TAGCGA20NGCT.tsv.gz
View
Analysis that produces monomer, dimer, and trimer binding models for Hth/Exd/Ubx from multiple SELEX-seq datasets. builder.hthExdUbx.json countTable.0.UbxIVa-Hth-Exd.30mer1.tsv.gz
countTable.1.UbxIVa-Exd.16mer1_rep1.tsv.gz
countTable.2.UbxIVa.16mer1_rep1.tsv.gz
countTable.3.Exd.tsv.gz
countTable.4.Hth.16mer2_rep1.tsv.gz
View
Analysis that produces a meCpG-aware binding model for ATF4/CEBPγ homo- and hetro-dimers from EpiSELEX-seq data. builder.epiSELEX.json countTable.0.run_11_10_15__R1_ATF4_HOMODIMER_80nM_flank_PCR__None.tsv.gz
countTable.1.run_11_10_15__R1_ATF4_HOMODIMER_80nM_flank_PCR__5mCG.tsv.gz
countTable.2.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__None.tsv.gz
countTable.3.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__5mCG.tsv.gz
countTable.4.run_10_05_17__R1_ATF4-CEBPg_50nM_highBand__None.tsv.gz
countTable.5.run_10_05_17__R1_ATF4-CEBPg_50nM_highBand__5mCG.tsv.gz
View
Analysis producing meCpG-, 5hmC- and 6mA-aware binding models for CEBPγ homodimers from EpiSELEX-seq data. builder.multiEpiSELEX.json countTable.0.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__None.tsv.gz
countTable.1.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__5mCG.tsv.gz
countTable.2.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__5hmC.tsv.gz
countTable.3.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__6mA.tsv.gz
View
Analysis producing a Dll KD-model using data from a single selection assay. builder.KD-single.json countTable.0.20201205_DlldN-12.tsv.gz View
Analysis producing a Dll KD-model using data from multiple selection assays performed at different TF concentrations. builder.KD-multi.json countTable.0.20201205_DlldN-9.tsv.gz
countTable.1.20201205_DlldN-10.tsv.gz
countTable.2.20201205_DlldN-11.tsv.gz
countTable.3.20201205_DlldN-12.tsv.gz
View
Analysis producing a RBFOX2 KD-model using multi-concentration RNA Bind-n-Seq data from Dominguez et al. (2018). builder.RBP.json countTable.0.RBFOX2-1nM.tsv.gz
countTable.1.RBFOX2-4nM.tsv.gz
countTable.2.RBFOX2-14nM.tsv.gz
countTable.3.RBFOX2-40nM.tsv.gz
countTable.4.RBFOX2-121nM.tsv.gz
countTable.5.RBFOX2-365nM.tsv.gz
countTable.6.RBFOX2-1100nM.tsv.gz
countTable.7.RBFOX2-3300nM.tsv.gz
countTable.8.RBFOX2-9800nM.tsv.gz
View
Analysis producing GR and co-factor binding models from ChIP-seq data from Starick et al. (2015). builder.ChIP-single.json countTable.0.IMR90_GR_chip-seq_rep1.tsv.gz View
Analysis producing a GR binding model and sample-specific activities from multiple ChIP-seq datasets from Polman et al. (2013). builder.ChIP-multi.json countTable.0.GR_30.tsv.gz
countTable.1.GR_300.tsv.gz
countTable.2.GR_3000.tsv.gz
View
Analysis of the peptide-sequence dependent kinetics of the tyrosine kinase Src using bacterial display data. builder.Kinase.json countTable.0.200205_Src-Kinase_5m.tsv.gz
countTable.1.200205_Src-Kinase_20m.tsv.gz
countTable.2.200205_Src-Kinase_60m.tsv.gz
View
For users who need direct access to the ProBound software, please register with Columbia Tech Ventures. After agreeing to a material transfer agreement (MTA), access to the software will be granted for non-comercial academic use.

Apply models using ProBoundTools

Given a ProBound fit file fit.final.json generated by the compute server (see above), our software package ProBoundTools can be used to predict the relative affinities of new sequences. This utility was developed in Java (downloaded here) and is verified for version 1.8.0_91. A tarball containing the JAR file with all dependencies can be downloaded here (the underlying source code is also accessable on GitHub).

After downloading the tarball, ProBoundTools is installed by executing:
tar -xvf ProBoundTools.tar.gz
export PROBOUND_DIR="/path/to/ProBoundTools"
alias proBoundTools='java -cp $PROBOUND_DIR/ProBoundTools/target/ProBound-jar-with-dependencies.jar  proBoundTools/App'
Here /path/to/ProBoundTools should be replaced with the directory created by tar.

Given the sequences of interest
AAAAGACGACTGCGGTCACTGAGGTGTAAA
ACTGTTTGCTCTATGCGGAGGAGCCCCTTA
TTAACTGGGTATAGGGGCGAATATGGCGAC
TTAGCCGGGAGGGGGCGCTCCGTAGTGGAT
ATAGTAGTCGTGCGCCCCCACTGGTGACAA
TGTTCCTTGCTTTTATAAGGTAAATGCAGG
(stored in the file seq.txt), the total relative affinity for each sequence is computed using the terminal command
proBoundTools -c 'loadFitLine(fit.final.json).buildConsensusModel().addNScoring().selectBindingMode(1).inputTXT(seq.txt).bindingModeScores(/dev/stdout)'
giving
AAAAGACGACTGCGGTCACTGAGGTGTAAA  3.06461e-05
ACTGTTTGCTCTATGCGGAGGAGCCCCTTA  1.00708e-04
TTAACTGGGTATAGGGGCGAATATGGCGAC  5.30426e-05
TTAGCCGGGAGGGGGCGCTCCGTAGTGGAT  7.04016e-02
ATAGTAGTCGTGCGCCCCCACTGGTGACAA  1.58707e-03
TGTTCCTTGCTTTTATAAGGTAAATGCAGG  2.71934e-06
Here Additional functionality can be listed using
java -cp ProBoundTools.jar proBoundTools/App -help

MotifCentral

A curated set of ProBound derived transcription factor binding models can be accessed at MotifCentral.org. Each model has a unique ID numer and can be loaded using the ProBoundTools command loadMotifCentralModel(modelID) (instead of loadFitLine(fit.final.json)).