lava

LAVA

massachusetts institute of technology (mit)
computer science and artificial intelligence laboratory (csail)
theory of computation group (toc)
computation and biology group (compbio)

email queries bab@mit.edu



LAVA (Lightweight Assignment of Variant Alleles) is an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that inexact matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0, while performing 4-9 times faster than a standard NGS genotyping pipeline and optionally using as little as 5.2 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays.

Source code is available on GitHub.

Ariya Shajii, Deniz Yorukoglu, Yun William Yu, Bonnie Berger; Fast genotyping of known SNPs through approximate k-mer matching. Bioinformatics 2016; 32(17): i538-i544. doi: 10.1093/bioinformatics/btw460

Supplement Materials


Test Dataset
"The Day the Volcano Erupted - A Halloween Horror Story" by coolinsights is licensed under CC BY 2.0