MetaPred2CS Web server

General Information

MetaPred2CS is a meta-predictor specifically designed to predict interactions in prokaryotic two-component system, i.e the pairing between histidine kinases and response regulators. MetaPred2CS is based on Support Vector Machine (SVM) that combines six individual sequence based protein-protein interaction prediction methods: two co-evolutionary based methods (in-silico two hybrid (i2H) and mirror tree (MT) methods) and four genomic context methods (phylogenetic profiling (PP), gene fusion (GF), gene neighbourhood (GN) and gene operon methods (GO)).

All methods implemented in MetaPred2CS require a BLAST search to identify homologous proteins to the query proteins. In the case of i2H and MT methods, the search is perfomed against UniProtKB downloaded from UniProt. In the case of genomics context based methods, sequences search are performed against our local reference genome database that included 243 genomes (see Table 1). The genomes and genome annotation were downloaded from NCBI database and operon architecture and transcription units from here.

Lastly, Metapred2CS has been trained in single-domain Histidine kinase and Response regulator proteins, in the case of hybrid two-component system proteins, users need to decompose the protein sequence in separate domains and submit them separately.

Table 1.

Training & Test - The P+ and P- datasets

The P+ and P- datasets contain 113 interacting and 1134 non-interacting experimentally validated TCS pairs respectively, and were compiled and manually curated from the current literature. These datasets and different sub-classes of these sets were used to train and test the MetaPred2CS using a k-fold cross validation strategy.

Prediction Parameters

i2H and MT methods

At the second section of submission form users have the option of tune the default values of the six different parameters used by i2H and MT methods (Figure 1).

Figure 1. Parameter i2H and Mirror tree methods.

i2H and MT methods requires multiple sequence alignments that are created automatically from user's query proteins using BLAST and clustalw. For BLAST search, users can adjust the e-value and b_value. The rest of the parameters will be used as default such as the as scoring matrix, which is BLOSUM_62. The Min Matches and Max Matches parameters stand for the minimum and maximum number of common species in both multiple sequence alignments between both query proteins. These parameters are used to filter the number of sequence and reduce the number of them. If the minimum number of common species is not reached, the job will be terminated. In this case, users need to re-submit using lower cut-off value for the Min Matches paremeter and/or a higher b_value for the BLAST search. The final set of parameters is the correlation function and the substitution matrix. The available options for the scoring function are: corr and rankorr and the substitution matrice: Maxhom_GCG.metric and Maxhom_McLachlan.metric.

Genomic context methods

At third section of submission form users has de option of tune the default parameters in the genome methods (Figure 2).

Figure 2. Paramers Genomic context methods.

In the case of Phylogenetic profiles users can change the E-value cut-off of the BLASTP search used to detect homologue proteins across the reference. This parameter is very important as discussed in this work. The second parameter accounts for mutual information to infer functional linkages between proteins. Likewise, in the case of Gene Fusion methods, users can also change the E-value cut-off on BLASTP . Moreover, users can tune the Bits cut-off on the local alignments performed using ssearch36 for query proteins which using Smith-Waterman algorithm to identify fusion events. Finally, in the case Gene Neighbourhood & Gene Operon methods, the two tunable parameters are the E-value cut-off on the BLASTP search and the cut-off genomic distance. The latter is important to define neighbouring genes and have been extensive discussed in the following works: Salgado H. et al., 2000; Strong M. et al., 2003; Ermolaeva MD. et al, 2001; Moreno-HG. et al., 2002; Ross O. et al., 1999 being 200bp (default) the generally accepted cut-off value.