Improved Differentially Private GWAS with the Neighbor Method

As genetic data is being introduced to the clinic there is growing interest in harnassing this data to help understand the relationship between disease and genotype. There are, however, many privacy issues that come with allowing users to query these medical databases. One method to help deal with this is known as differential privacy. Various approaches have been suggested to apply the ideas of differential privacy to GWAS studies. One of the most effective techniques is known as the neighbor mechanism. This approach to finding significantly correlated SNPs is often powerful, but runs into problems both with scalability (a fact that has lead others to implement approximate versions that lead to reduced privacy), and issues with innaccuracy when certain parameters are incorrectly chosen. We manage to overcome both problems. First we introduce a dynamic boundary approach that deals with accuracy problems. In addition, using a convex analysis based approach we are able to come up with an algorithm that calculates, for a given SNP, the neighbor distance in constant time, overcoming the major computational bottleneck in the neighbor method. You can download our implementation of this method by clicking here. You can also download the readme file with instructions on use by clicking README.

In order to run PrivPick.py a user needs to have python installed, as well as numpy and scipy. Beyond that there are no other dependencies. For details on how to run the program see the README file.

Any questions or concerns can be directed to seanken at mit dot edu