Compact Integration of Multi-Network Topology for Functional Analysis of Genes
Computation and Biology Group, MIT CSAIL
![]() |
Mashup extracts a compact vector representation of topology that accurately explains the topological patterns of nodes in multiple heterogeneous interaction networks. The vector representations — one for each gene/protein — can then be readily plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins.
Compact Integration of Multi-Network Topology for Functional Analysis of Genes
Hyunghoon Cho, Bonnie Berger, Jian Peng
Cell Systems 3 (6), 2016 [Link]
A previous version of this work appeared in:
Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks
Hyunghoon Cho, Bonnie Berger, Jian Peng
International Conference on Research in Computational Molecular Biology (pp. 62-64), 2015 [Link]
A MATLAB implementation of Mashup, example data sets (human and yeast), and evaluation code for gene function prediction using Mashup representations can be downloaded from [here].
The following text files contain pre-trained Mashup representations of genes in human and several model organisms. Each row contains the feature vector of a gene. The corresponding list of gene names (one per row) is also provided.
Used in our publication:
Human (Homo sapiens), STRING v9.1 (without text-mining) [vectors] [gene list]
(Note: This gene list mostly consists of gene symbols, but ENSP IDs are used for genes without known symbols. We used BioMart's ID conversion tool to obtain this list.)
Yeast (Saccharomyces cerevisiae), STRING v9.1 (without text-mining) [vectors] [gene list]
Other organisms:
Mouse (Mus musculus), STRING v10 (without text-mining) [vectors] [gene list]
Fruit fly (Drosophila melanogaster), STRING v10 (without text-mining) [vectors] [gene list]
Zebrafish (Danio rerio), STRING v10 (without text-mining) [vectors] [gene list]
Nematode (Caenorhabditis elegans), STRING v10 (without text-mining) [vectors] [gene list]
Note: For these four organisms, we used the default setting of 1000 dimensions. While the downstream performance of our framework is quite robust to this parameter, you may want to consider using a different number more suitable for your application.