The Struct2Net Server Help Page

Table of Content

1. Query gene id/name

2. Query sequence

3. Subject sequence

4. Batch query

5. Query options

6. Email

7. Walkthrough

8. Input page

9. Job Status page

10. Fetch job page

11. Results page

Query gene id/name

Type your gene name or gene identifier here to search for PPIs within commonly studied organisms human, yeast, and fly. Struct2Net supports a variety of identifiers corresponding to a specific protein or protein-coding gene. Currently supported ID types include Ensembl, EntrezGene, RefSeq, UniProtKB, FlyBase, SGD, and symbols. For example, some acceptable inputs for the molecular chaperone HSP90 from S. cerevisiae include:

    Symbol:         HSP90
    SGD:            YPL240C
    EntrezGene id:  855836

Struct2Net can also take multiple gene names or ids and find predicted PPIs for each. Entries must be separated by a space. For example, in the query gene id/name textbox:

    CG10014 CG12072 YKL064W

If the subject gene id/name textbox is filled, Struct2Net will determine the probability of interaction between the first entry in the query textbox and the entry in the subject textbox.

Query sequence

Type your amino acid sequence here in FASTA format. For example:

    >CG30503
    MLSRAAFLTVLLTLIYASHAAAGLSITVPGTKWCGPGNIAANYDDLGTEREVDTCCR
    AHDNCEEKIPPLEEAFGLRNDGFFPIFSCACESAFRNCLTALRNGHSLALGKIYFNT
    KEVCFGYGHPIVSCQEKQADLFETRCLSYRVDEGQPQRWQFYDLALYTHVSGSEEDSRD

Note that the server utilizes the description text in dislaying job status information.

Subject sequence

If a subject sequence is provided, the server will determine the probability of interaction between the query and subject sequences.

Batch query

Batch queries are supported if the user wishes to provide more than a single pair of query/subject sequences. This is most useful when selecting the option to thread pairs of sequences onto all templates. If both the query/subject and file upload textboxes have been filled, Struct2Net will process the uploaded file. Please clear the file upload if single querying is desired. The user can upload a multiFASTA file which has the following format below. Pairs of sequences must be separated by a blank line. In this example, SEQUENCE_1 and SEQUENCE_2 are pairs and SEQUENCE_3 and SEQUENCE_4 are pairs. Please do not upload files with extra formatting such as rtf files:

    >SEQUENCE_1
    MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
    LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
    IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
    MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
    >SEQUENCE_2
    SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
    ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
    
    >SEQUENCE_3
    MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
    LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
    IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
    MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
    >SEQUENCE_4
    SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
    ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

Query options

Select which query option you would like to use. Struct2Net provides flexibility by allowing the user to query by sequence or by id. Querying by sequence provides a fast/approximate approach and slow/accurate approach.

Query PPIs to best-hit ortholog in yeast, fly, or human (fast):

For the most commonly studied organisms (human, yeast, and fly), we have pre-computed all-vs.-all predictions and stored them. If the user desires a quick response of reasonable quality, the server finds the orthologs of the given protein(s) in the stored-set of human/yeast/fly proteins, maps back the corresponding set of stored predictions to the given protein(s), and outputs this result.

Thread sequences onto all templates (slow):

This query option performs a full-blown prediction, involving both a threading algorithm and machine learning algorithm. Pairs of sequences are threaded onto all templates of a dimer template library. A logistic regression score is then computed for the pair reflecting their probability of interaction. Since this approach takes some time, the user is emailed a link to the results when the job has completed.

Email

Enter your e-mail address here. A link to the results will be emailed as soon as the job is completed. Alternatively, results can be viewed anytime after job completion by providing the corresponding job id at this page. If an incorrect email address is given, there is no way of contacting you when the job has completed. The Struct2Net job ID will be included in the subject line of emails sent by the server. Here is an example message header with job ID REUOPO7N:

    from: no-reply@struct2net.csail.mit.edu
    to: anyone@anywhere.edu
    date: Thu, Nov 12, 2008 at 3:54 PM
    subject: Your Struct2Net job REUOPO7N has completed
    mailed-by: struct2net.csail.mit.edu

Walkthrough

Input page

In this example, a yeast systematic gene name, "YMR186W", is provided in the query gene text box. If no gene id/name is provided in the subject gene text box, all predicted interactors of "YMR186W" are identified. However, if a gene id/name is supplied in the subject gene text box, the server determines the probability of interaction between the two.

The user can also query for PPIs by supplying either one or a pair of protein sequences in FASTA format. In this example, two protein sequences, 2IBP (chain A) and CIT1, are provided. Struct2Net then determines the probability of interaction using the approach specified by the query option (see description above).

A link to the results is sent to the user-provided email address. Once the job is finished, the user can check on the results at anytime using either this link or by job ID.

Job Status page

A job status page is shown when the job is running and has not completed yet. Please note that results can be fetched at a later time using the job ID. As soon as the job completes, the browser will redirect to the results.

Fetch job page

Each query submission is associated with a job ID that can be used to fetch the results when the job has completed. The job ID can be found on the Job Status page as well as in the auto response e-mails notifying the user of a submitted and completed job.

Results page (for pre-computed predictions in S. cerevisiae, D. melanogaster, and H. sapiens).

Legend

1) A summary of the number of predicted PPIs represented in the current results. Here we have 13 total predicted interactions from the set of query gene identifiers. Also shown here is the total number of predicted interactions that has been experimentally observed according to BioGRID repository for interaction data. 2 predicted PPIs have been observed.

2) This helpful comment describes that all rows of predicted PPIs that have also been experimentally observed are color-coded. Hopefully, this makes it easier for users to direct their attention to those putative interactions not covered by experimentation or vice versa.

3) Search results can be downloaded in PSI-MI XML, CSV or TAB-delimited format.

4) Statistics for predicted PPIs are also displayed on a per-gene basis. The total number of predicted PPIs for a query gene is provided. The number of predictions that have been experimentally observed from BioGRID is also provided. These stats are intended to be solely as a convenience for users.

5) Annotation data for each query gene. Organism name, functional description, GO annotations, and alternative identifiers are provided here. GO terms are linked to their detailed profiles on the Gene Ontology website and aliases are linked to their corresponding database summary pages.

6) The gene ID of each predicted interactor is given in this column. Currently, predicted interactors from D. melanogaster are linked to FlyBase and interactors from S. cerevisiae are linked to SGD. Notice that the first and second rows in the table of predictions for this query gene are shaded. This indicates predicted interactions TSA1-TSA2 and TSA1-TSA1 that have been experimentally observed. This is also indicated by the "In BioGRID?" column.

7) Logistic regression score evaluating the probability of an interaction occurring between, for example, the query gene TSA1 and predicted interactor TSA2 (YDR453C) (score = 0.575)).

8) Annotation data for each predicted interactor. The description annotates function, biological process, etc.

9) GO terms associated with a predicted interactor. Each term type (e.g. molecular function, cellular component) shows a list of GO annotations describing a predicted interactor. Click to expand and view the list.

10) Aliases for a predicted interactor. Each alias links to its corresponding database summary page (e.g. EntrezGene, Ensembl, FlyBase, SGD, HUGO (HGNC), HPRD).

Results page (for predictions in other organisms using the Struct2Net algorithm)

Questions or comments? Please contact struct2net@csail.mit.edu