This guy is the Mark Zuckerberg of open-source genetics


Three years ago, Bastian Greshake spit in a vial and sent it off to personal genomics company 23andMe for analysis. He’d spent years studying the genetics of other organisms, but didn’t know much about his own DNA. He was curious.

“I am a biologist, so it’s even more fun if it’s your own personal data. It’s like performing research on an organism which is really close to your heart: yourself,” he said.

Soon after he got his results, he published his genetic information to the collaboration site Github, and then went looking for others who were open-sourcing their DNA. But even when he found the occasional person willing to put his or her genes online for public perusal, the data were hard to use for science. They didn’t come with any phenotypic information—eye color, weight, height, medical history. That information, tied to your genetic data, is what’s really valuable and interesting. By themselves, genes are kind of dull.

What was really needed, Greshake realized, was a social network for DNA, one that would make it easy to upload genetic information and share it with others. “Maybe there are people who are interested in publishing their genetic information on the web to make it available, but those people don’t have the opportunity,” Greshake says he thought at the time.

So, he set out to build an open-source website—OpenSNP—that would be able to pull in genetic information from services like 23andMe, plus any other data users wanted to upload, with ease. He hoped that one day scientists would be able to use it to do world-changing science.

That hasn’t quite happened yet. Only 1,500 people have joined the platform—too small a group to power scientific studies. In part, that’s because unleashing your genetic information on the open web raises all kinds of scary privacy questions. But Greshake is still adamant that open-source is the way to go.

His vision isn’t completely crazy. Open-source tools have been a boon to the tech industry in many ways. Yahoo, for example, open-sourced the Hadoop database, which allowed companies like Facebook, Twitter, and eBay to be built on top of it. Part of the Netflix recommendation engine relies on open-source artificial-intelligence software. Github is one of the most powerful tools for web developers ever created. In science, open-source tools allow researchers to more easily validate the work of others.

And when it comes to genetics, open-sourcing has potentially life-saving benefits.

“The most attractive thing is that these platforms allow people with their own raw SNP data to look at what science has discovered about a small fraction of their alleles,” geneticist Misha Angrist, a senior fellow in Science & Policy at Duke University’s Social Science Research Institute, told me over e-mail.

SNP (pronounced “snip”) is short for single nucleotide polymorphism—science-speak for a change at a single location on the DNA strand. It’s akin to a letter substitution in a single word in a sentence. Over the years, scientists have found that some SNPs are associated with certain genetic conditions. Businesses like 23andMe were built on that science. Their value came from telling consumers a little bit about the future that was written in their genes.

And that’s what happened with Greshake, now a 30-year-old doctoral student in bioinformatics in Germany. When he got the results of his DNA test, he was relieved to find out his genes didn’t seem to predispose him to Alzheimer’s or Parkinson’s disease, two neurological conditions that affect our brains’ ability to remember and control movement.

His news wasn’t all good, though: the test also revealed that he had two copies of several SNPs that some studies had shown were associated with an increased risk of prostate cancer. Since we inherit one copy of each gene from each parent, Greshake figured that his father was probably also at risk. He used that information to strong-arm his father into going to the doctor. Upon examination, the doctor found a tumor growing in his father’s prostate, which they were able to catch early. “Of course it’s easy to say you should go [to the doctor] anyway,” he says, “but he wouldn’t have gone without the genetic test.”

Greshake hopes that the information on openSNP can help other people in similar ways. But first, he’ll need to grow his open-source genetic library. And for that to happen, many more people will have to begin gathering their own raw data from services like 23andMe. Last year, the FDA slapped 23andMe with a cease-and-desist letter ordering it to stop marketing its spit-box test as a health report. But the agency is now easing up on 23andMe, and has begun to allow certain types of genetic tests to be sold without review.

Greshake and other open-source evangelists hope that the availability of simple, cheap genetic testing kits, coupled with open-source platforms like openSNP, Promethease and SNPedia, will spark interest in genetics among the general public—not just for finding hints of horrible disease lurking in the future, but for more pedestrian applications as well. An artist, for example, took Greshake’s openSNP data and turned it into music. No, that’s not world-changing science, but it shows how many ideas can bloom when a closed data universe is shoved out in the open.

“I never would have thought about that,” Greshake said of the artist. “He created new value out of it.”

Greshake’s musical genes:

To learn more about commercial genetic testing, check out our stories about genetic social networks and DNA data mining.

Daniela Hernandez is a senior writer at Fusion. She likes science, robots, pugs, and coffee.

Inline Feedbacks
View all comments
Share Tweet Submit Pin