ProBound Web Server

Welcome to the website of ProBound, a software for inferring energetic models of sequence recognition based on high-throughput sequencing data. Details about the methods can be found in the bioRxiv preprint.

This website describes how to

Run ProBound
Apply models using ProBoundTools
View pre-computed models MotifCentral

Run ProBound

The easiest way to run ProBound is to simply submit a job to our dedicated compute server. Please complete all the steps below to run your job. A sofware manual can be found here.

Note that this tool is designed for academic, non-commercial use only. We hope you enjoy this service.

Step 1: Please tell us more about yourself

Step 2: Please upload a ProBound configuration file

Directions: Create and upload a ProBound configuration file in the JSON format. For instructions on creating a configuration file, please see this manual. For example configuration files that correspond to analyses from the paper, see the table below.

For the simplest use case - that of fitting a model containing a single sequence-specific binding mode to a single SELEX dataset - the following configuration file can be used:


            [   
    {"function": "optimizerSetting", "lambdaL2": 1e-6, "pseudocount": 20, "likelihoodThreshold": 0.0002 },
    {"function": "addTable", "leftFlank":"ACACTCTTTCCCTACACGACGCTCTTCCGATCTTGACGTC", "rightFlank":"GACGTCAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG",    
        "variableRegionLength":30, "nColumns":2, "countTableFile":"countTable.0.CTCF_r3.tsv.gz", "inputFileType":"tsv.gz" },
    {"function": "addSELEX" },
    {"function": "addNS" },
    {"function": "addBindingMode", "size": 12, "flankLength": 5},
    {"function": "bindingModeConstraints", "index": 1, "maxFlankLength": -1, "maxSize": 18,
    "fittingStages": [ 
        { "optimizeFlankLength": true         },
        { "optimizeMotifShiftHeuristic": true },  
        { "optimizeSizeHeuristic": true       } ] }
]

Here:

"leftFlank":"ACACTCTTTCCCTACACGACGCTCTTCCGATCTTGACGTC" and "rightFlank":"GACGTCAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" should be edited to specify the fixed sequences to the left and right of the variable region.
"variableRegionLength":30 should be changed to specify the length of the variable region.
"nColumns":2 should specify the number of columns.
An example count table can be found here.

Common problems:

Editing JSON objects manually is challenging and syntax errors are easily introduced. Online JSON validators such as this are useful for identifying such errors.
Make sure that the distance specified by flankLength does not exceed the length of the sequences specified in leftFlank and leftFlank.

Select file:

Step 3: Please upload your data Directions: Upload all count tables in .tar.gz format. For examples of count tables, please see the table below.
Note: Once submitted, all datasets will be sorted using natural order by filename, with the first sorted file corresponding to count table 0, so on and so forth. For example, files countTableB.tsv.gz, countTable1.tsv.gz, and countTableA.tsv.gz will be assigned to count tables 2, 0 and 1, respectively.

Common problems:

By default, ProBound uses the alphabet ACGT and the letter N is therefore invalid.
The server currently supports count tables with at most 5,000,000 lines

Select file(s):

By clicking this box, I certify that this job is for academic/non-comercial purposes and I agree with the terms of service and privacy policy.

Example Datasets

Description	Configuration	Count tables	Output
Analysis that produces a CTCF recognition model from SMiLE-seq data from Isakova et al. (2017).	builder.singleTF.json	countTable.0.CTCF_r3.tsv.gz	View
Joint analysis that produces a single CTCF consensus model from multiple CTCF SELEX datasets.	builder.multiTF.json	countTable.0.CTCF_r3.tsv.gz countTable.1.CTCF_ESAJ_TAGCGA20NGCT.tsv.gz	View
Analysis that produces monomer, dimer, and trimer binding models for Hth/Exd/Ubx from multiple SELEX-seq datasets.	builder.hthExdUbx.json	countTable.0.UbxIVa-Hth-Exd.30mer1.tsv.gz countTable.1.UbxIVa-Exd.16mer1_rep1.tsv.gz countTable.2.UbxIVa.16mer1_rep1.tsv.gz countTable.3.Exd.tsv.gz countTable.4.Hth.16mer2_rep1.tsv.gz	View
Analysis that produces a meCpG-aware binding model for ATF4/CEBPγ homo- and hetro-dimers from EpiSELEX-seq data.	builder.epiSELEX.json	countTable.0.run_11_10_15__R1_ATF4_HOMODIMER_80nM_flank_PCR__None.tsv.gz countTable.1.run_11_10_15__R1_ATF4_HOMODIMER_80nM_flank_PCR__5mCG.tsv.gz countTable.2.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__None.tsv.gz countTable.3.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__5mCG.tsv.gz countTable.4.run_10_05_17__R1_ATF4-CEBPg_50nM_highBand__None.tsv.gz countTable.5.run_10_05_17__R1_ATF4-CEBPg_50nM_highBand__5mCG.tsv.gz	View
Analysis producing meCpG-, 5hmC- and 6mA-aware binding models for CEBPγ homodimers from EpiSELEX-seq data.	builder.multiEpiSELEX.json	countTable.0.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__None.tsv.gz countTable.1.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__5mCG.tsv.gz countTable.2.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__5hmC.tsv.gz countTable.3.run_06_05_17__R1_CEBPg_homo_75nM_lowBand__6mA.tsv.gz	View
Analysis producing a Dll K_D-model using data from a single selection assay.	builder.KD-single.json	countTable.0.20201205_DlldN-12.tsv.gz	View
Analysis producing a Dll K_D-model using data from multiple selection assays performed at different TF concentrations.	builder.KD-multi.json	countTable.0.20201205_DlldN-9.tsv.gz countTable.1.20201205_DlldN-10.tsv.gz countTable.2.20201205_DlldN-11.tsv.gz countTable.3.20201205_DlldN-12.tsv.gz	View
Analysis producing a RBFOX2 K_D-model using multi-concentration RNA Bind-n-Seq data from Dominguez et al. (2018).	builder.RBP.json	countTable.0.RBFOX2-1nM.tsv.gz countTable.1.RBFOX2-4nM.tsv.gz countTable.2.RBFOX2-14nM.tsv.gz countTable.3.RBFOX2-40nM.tsv.gz countTable.4.RBFOX2-121nM.tsv.gz countTable.5.RBFOX2-365nM.tsv.gz countTable.6.RBFOX2-1100nM.tsv.gz countTable.7.RBFOX2-3300nM.tsv.gz countTable.8.RBFOX2-9800nM.tsv.gz	View
Analysis producing GR and co-factor binding models from ChIP-seq data from Starick et al. (2015).	builder.ChIP-single.json	countTable.0.IMR90_GR_chip-seq_rep1.tsv.gz	View
Analysis producing a GR binding model and sample-specific activities from multiple ChIP-seq datasets from Polman et al. (2013).	builder.ChIP-multi.json	countTable.0.GR_30.tsv.gz countTable.1.GR_300.tsv.gz countTable.2.GR_3000.tsv.gz	View
Analysis of the peptide-sequence dependent kinetics of the tyrosine kinase Src using bacterial display data.	builder.Kinase.json	countTable.0.200205_Src-Kinase_5m.tsv.gz countTable.1.200205_Src-Kinase_20m.tsv.gz countTable.2.200205_Src-Kinase_60m.tsv.gz	View

For users who need direct access to the ProBound software, please register with Columbia Tech Ventures. After agreeing to a material transfer agreement (MTA), access to the software will be granted for non-comercial academic use.

Apply models using ProBoundTools

Given a ProBound fit file fit.final.json generated by the compute server (see above), our software package ProBoundTools can be used to predict the relative affinities of new sequences. This utility was developed in Java (downloaded here) and is verified for version 1.8.0_91. A tarball containing the JAR file with all dependencies can be downloaded here (the underlying source code is also accessable on GitHub).

After downloading the tarball, ProBoundTools is installed by executing:

tar -xvf ProBoundTools.tar.gz
export PROBOUND_DIR="/path/to/ProBoundTools"
alias proBoundTools='java -cp $PROBOUND_DIR/ProBoundTools/target/ProBound-jar-with-dependencies.jar  proBoundTools/App'

Here /path/to/ProBoundTools should be replaced with the directory created by tar.

Given the sequences of interest


AAAAGACGACTGCGGTCACTGAGGTGTAAA
ACTGTTTGCTCTATGCGGAGGAGCCCCTTA
TTAACTGGGTATAGGGGCGAATATGGCGAC
TTAGCCGGGAGGGGGCGCTCCGTAGTGGAT
ATAGTAGTCGTGCGCCCCCACTGGTGACAA
TGTTCCTTGCTTTTATAAGGTAAATGCAGG

(stored in the file seq.txt), the total relative affinity for each sequence is computed using the terminal command

proBoundTools -c 'loadFitLine(fit.final.json).buildConsensusModel().addNScoring().selectBindingMode(1).inputTXT(seq.txt).bindingModeScores(/dev/stdout)'

giving

AAAAGACGACTGCGGTCACTGAGGTGTAAA  3.06461e-05
ACTGTTTGCTCTATGCGGAGGAGCCCCTTA  1.00708e-04
TTAACTGGGTATAGGGGCGAATATGGCGAC  5.30426e-05
TTAGCCGGGAGGGGGCGCTCCGTAGTGGAT  7.04016e-02
ATAGTAGTCGTGCGCCCCCACTGGTGACAA  1.58707e-03
TGTTCCTTGCTTTTATAAGGTAAATGCAGG  2.71934e-06

Here

loadFitLine(fit.final.json): Loads the ProBound fit.
buildConsensusModel(): Reconciles experiment- and round-specifc activities and adjusts the position-specific affinity matrix (PSAM) so that the maximum relative affinity for each binding offset is 1.0 (use writeModel(/dev/stdout) to inspect the updated model).
addNScoring(): Extends the alphabet to support the wildcard character 'N'.
selectBindingMode(1): Selects the second (zero indexed) binding mode (PSAM).
inputTXT(seq.txt): Specifies where the input sequences are located. All sequences must be of equal length.
bindingModeScores(/dev/stdout): Computes the relative affinity for each offset/strand and reports the sum to standard output. The affinities for each offset/strand can instead be reported using bindingModeScores(/dev/stdout,profile).

Additional functionality can be listed using


java -cp ProBoundTools.jar proBoundTools/App -help

MotifCentral

A curated set of ProBound derived transcription factor binding models can be accessed at MotifCentral.org. Each model has a unique ID numer and can be loaded using the ProBoundTools command loadMotifCentralModel(modelID) (instead of loadFitLine(fit.final.json)).

Homepage for ProBound

Run ProBound

Example Datasets

Apply models using ProBoundTools

MotifCentral