28C3 - Version 2.3.5

28th Chaos Communication Congress
Behind Enemy Lines

Bastian Greshake
Philipp Bayer
Day Day 2 - 2011-12-28
Room Saal 3
Start time 23:00
Duration 01:00
ID 4730
Event type Lecture
Track Science
Language used for presentation English

Crowdsourcing Genome Wide Association Studies

Freeing Genetic Data from Corporate Vaults

It was only a couple of years ago that generating genetic information about individuals was expensive and laborious work. Modern techniques have drastically cut cost and time needed to get an insight into one's genome and have ultimately led to the formation of personal genetics companies – like 23andMe, deCODEme and others – that now offer direct-to-customer genetic testing. With a price tag of those tests starting at about 100 €, the number of people that do such tests is on the rise. By now, 23andMe alone has over 100.000 paying customers, with over 60.000 of them willing to donate their genetic data and to actively participate in research projects by filling out surveys, e.g. on their medical histories. This has resulted in a high-quality dataset with genetic information of 60.000 individuals. The best part: The data has already been paid for by the participants in the research.

Who would not love to get their hands on data like this? Unfortunately, the data sits locked away in corporate vaults, inaccessible to interested (citizen) scientists. But what if we could change this?

We've created openSNP, a central, open source, free-to-use repository which lets customers of genotyping companies upload their genotyping data and annotate them with phenotypes. OpenSNP provides its users with the latest scientific research on their genotypes and lets scientists download annotated genotypes to make science more open.

Companies that perform Direct-To-Customer (DTC) genetic tests have now been around for about six years, with 23andMe – founded in 2006 – and deCODEme being two of the oldest companies on the market. Their customers receive a test tube via mail, spit into this tube and send it back to their DTC company to get their genetic information analyzed. The tests performed by DTC companies do not utilize the more famous DNA sequencing, but rely on faster and cheaper DNA microarrays instead.

Microarrays screen for around 1 million genetic markers, called Single Nucleotide Polymorphisms (SNPs). A SNP is a genomic variation, where a single base is changed at one site between members of a population. Usually a SNP has only two alleles (variants) and occurs with a frequency of at least 1% in the population. Spread over the whole human genome, each of us carries around 10 million variable sites, where 10% are covered by DTC-companies. Because of their uniqueness, SNPs can be used as markers associated with certain conditions. For example, there are variations of SNPs that are associated with elevated risks of developing breast cancer or Alzheimer’s. Other SNPs can be used to predict how a person metabolizes chemicals or drugs.

23andMe uses the results of consenting customers to perform their own genome wide association studies (GWAS). Those studies check for statistical differences between different groups. In a simple example one could have a group that is known to have Alzheimer’s and a control-group that does not have Alzheimer’s. Given enough participants, one can then look for genetical variants that are over- or underrepresented in one of the groups. The variants that are found by this method can then be used as predictors for Alzheimer’s.

We feel that research projects all over the world and science in general would benefit from a rich, freely available source of linked, genetic data. And although genome wide association studies need a minimum number of participants to be able to find significant variations, it is not necessary to have 30.000 participants in your study. There are many publications with significant results with a total number of participants of less than 5000 individuals. Given the current number of 23andMe customers, one only needs 5 % of them to participate in freely sharing their genetic information together with basic information on some medical conditions or other variations to reach the critical mass to be able to perform simple association studies! While many people have already started to publish their results on GitHub et al. and movements like DIYBio are starting to take off, there are no real efforts to create a repository to centrally collect this kind of data.

But what if one could create an open platform to collect this kind of linked data? Is it possible to perform crowd-sourced association studies to create new knowledge about our genes? With the creation of openSNP we have tried (and are still trying) to find out.