What it doesn't do: BetaWrap uses very little information about the sequences of the known beta-helices. It doesn't perform sequence comparisons to the known beta-helices or any other sequences in the PDB (you do have the option of running an additional search against a profile HMM built from the known sequences, see below). You should definitely do these comparisons for any sequences you are interested in, using for example the NCBI's BLAST service. BetaWrap is not a threading program, per se, in that it doesn't compare the sequence to any other possible template structures. As a result it is much faster than threading programs, but it won't notice if your sequence, which might make a mediocre beta-helix, would in fact make a fantastic transmembrane beta-barrel. You should consider using some version of threading or profile program for sequences picked out by BetaWrap (for example 3D-PSSM). But don't be concerned if threading doesn't support BetaWrap's prediction (as long as it doesn't find a highly significant alternative hit) -- the threading programs we tried did not do so well in recognizing similarity between many of the known beta-helices.
How well it works: BetaWrap has been shown to distinguish between beta-helices and non-beta-helices when run on a non-redundant version of the PDB. In addition, a seven-fold cross-validation indicated that BetaWrap is able to recognize beta-helices from one family when trained on structures from the other families. This gives us hope that the algorithm can recognize novel beta-helices from their sequence alone. Prediction of protein structure in the absence of detectable sequence similarity is still a risky business, however, and we have found sequences in larger databases which score well under the algorithm but which are not likely to have a beta-helical structure. Our experience indicates that the majority of these likely false-positives have a detectable sequence repeat. Because BetaWrap rewards for like-on-like stacking of certain residue types in the core of the beta-helix, it is occasionally fooled by sequences whose repeat lengths and sequences match up with the rung template. Thus a very significant score for a protein with a sequence repeat of less than 40 residues should be considered with some caution. The known right-handed beta-helices do not have detectable repeats at the sequence level. As described below, we offer the option of searching for two families of repeats which have occasionally fooled the algorithm. Reassuringly, both of these families have coiled folds like the right-handed beta-helix: one forms a left-handed beta-helix, the other an alpha-beta coiled fold.
Pfam searches: As mentioned above, BetaWrap is occasionally fooled by sequences with sequence repeats. This has been primarily observed for sequences in two families: the hexapeptide repeat family and the leucine rich repeat family. The protein family database Pfam has generated profile HMM's for both of these families, and we've included the option of searching for these repeats at the bottom of the query form.
The Pfam and HMMER results attach an E-value to sequence hits. This E-value is an estimate of the expected number of hits with equal or better scores, given the number of query sequences. These E-values are estimated empirically by a calibration process involving random sequences.
Pfam HMM's, Copyright (C) 1996-1999 Pfam Consortium.
HMMER software, Copyright (C) 1992-1998 Washington University
School of Medicine.