massachusetts institute of technology (mit)
computer science and artificial intelligence laboratory (csail)
theory of computation group (toc)
computation and biology group (compbio)

email queries

LAVA (Lightweight Assignment of Variant Alleles) is an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that inexact matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0, while performing 4-9 times faster than a standard NGS genotyping pipeline and optionally using as little as 5.2 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays.

Source code is available on GitHub.

Ariya Shajii, Deniz Yorukoglu, Yun William Yu, Bonnie Berger; Fast genotyping of known SNPs through approximate k-mer matching. Bioinformatics 2016; 32(17): i538-i544. doi: 10.1093/bioinformatics/btw460

Supplement Materials

Test Dataset
"The Day the Volcano Erupted - A Halloween Horror Story" by coolinsights is licensed under CC BY 2.0