CatRes
massachusetts institute of technology (mit)
computer science and artificial intelligence laboratory (csail)
theory of computation group (toc)

computation and biology group (compbio)

email queries bab@ mit.edu



We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information. An annotation of an enzyme's catalytic residues describes their specificbiochemical roles in the catalyzed reaction. While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, it has remained difficult to predict the residues' locations and specific chemical roles, especially when only the enzyme's sequence and no homologous structures are available. Catalytic residues that perform the same biochemical function often have similar chemical environments, which should be reflected by similarity in their sequence. To formalize this reasoning, we construct short sequence profiles for each catalytic residue by picking a segment of a multiple sequence alignment constructed from homologous proteins. We then define a Kullback-Leibler (KL) relative entropy distance measure between two profiles, and demonstrate that this purely information-theoretic distance measure effectively captures even subtle bi chemical variations in the catalytic residues' roles. With the KL distance, we transform a problem of comparing function annotations to a problem of comparing sequence profiles by using a training set consisting of known catalytic residues, their locations, and roles. We apply the method to the biologically important glycohydrolase enzyme class, which include proteins with very different sequences and structures. In a cross-validation test, our approach correctly predicts the location of 82% of the enzymes' catalytic residues with specificity >99%, and the biochemical role of 80% of catalytic residues. Our results compare favorably to existing methods, and our method is more broadly applicable because it relies on sequence and not structure information.

Supplementary Information for the our paper, submitted to ISMB 2006, is here.