Remember when we all got a little creeped out that genetic testing companies can be forced to turn over your data to law enforcement? Ah, those were simpler times. Police found the Golden State Killer last week thanks in part to a relative's sample in a publicly available DNA database. The same kind that your relatives may already be in.
Here at 23andMe, it's our policy to resist law enforcement inquiries to protect customer privacy. We have never given customer information to law enforcement officials. https://t.co/FkwnJyzyd8
— 23andMe (@23andMe) April 27, 2018
Plenty of people have voluntarily uploaded their DNA to GEDmatch and other databases, often with real names and contact information. It's what you do if you're an adopted kid looking for a long-lost parent, or a genealogy buff curious about whether you have any cousins still living in the old country. GEDmatch requires that you make your DNA data public if you want to use their comparison tools, although you don't have to attach your real name. And they're not the only database that has helped law enforcement track people down without their knowledge.
How DNA Databases Help Track People Down
We don't know exactly what samples or databases were used in the Golden State Killer's case; the Sacramento County District Attorney's office gave very little information and hasn't confirmed any further details. But here are some things that are possible.
Y chromosome data can lead to a good guess at an unknown person's last name.
Cis men typically have an X and a Y chromosome, and cis women two X's. That means the Y chromosome is passed down from genetic males to their offspring -- for example, from father to son. Since last names are also often handed down the same way, in many families you'll share a surname with anybody who shares your Y chromosome.
A 2013 Science paper described how a small amount of Y chromosome data should be enough to identify surnames for an estimated 12 per cent of white males in the US. (That method would find the wrong surname for 5 per cent, and the rest would come back as unknown.) As more people upload their information to public databases, the authors warned, the success rate will only increase.
This is exactly the technique that genealogical consultant Colleen Fitzpatrick used to narrow down a pool of suspects in an Arizona cold case. She seems to have used short tandem repeat (STR) data from the suspect's Y chromosome to search the Family Tree DNA database, and she saw the name Miller in the results.
The police already had a long list of suspects in the Arizona case, but based on that tip they zeroed in on one with the last name Miller. As with the Golden State Killer case, police confirmed the DNA match by obtaining a fresh DNA sample directly from their subject - the Sacramento office said they got it from something he discarded. (Yes, this is legal, and it can be an item as ordinary as a used drinking straw.)
The authors of the Science paper point out that surname, location, and year of birth are often enough to find an individual in census data.
SNP files can find family trees.
When you download your "raw data" after mailing in a 23andme or Ancestry test, what you get is a list of locations on your genome (called SNPs, for single nucleotide polymorphisms) and two letters indicating your status for each. For example, at a certain SNP you may have inherited an A from one parent and a G from the other.
Genetic testing sites will have tools to compare your DNA with others in their database, but you can also download your raw data and submit it to other sites, including GEDmatch or Family Tree DNA. (23andme and Ancestry allow you to download your data, but they don't accept uploads.)
But you don't have to send a spit sample to one of those companies to get a raw data file. The DNA Doe project describes how they sequenced the whole genome of an unidentified girl from a cold case and used that data to construct a SNP file to upload to GEDmatch. They found someone with enough of the same SNPs that they were probably a close cousin. That cousin also had an account at Ancestry, where they had filled out a family tree with details of their family members. The tree included an entry for a cousin of the same age as the unidentified girl, and whose death date was listed as "missing -- presumed dead." It was her.
Your DNA Is Not Just Yours
When you send in a spit sample, or upload a raw data file, you may only be thinking about your own privacy. I have nothing to hide, you might tell yourself. Who cares if somebody finds out that I have blue eyes or a predisposition to heart disease?
But half of your DNA belongs to your biological mother, and half to your biological father. Another half - cut a different way - belongs to each of your children. On average, you share half your DNA with a sibling, and a quarter with a half-sibling, grandparent, aunt, uncle, niece or nephew. You share about an eighth with a first cousin, and so on. The more of your extended family who are into genealogy, the more likely you are to have your DNA in a public database, already contributed by a relative.
In the cases we mention here, the breakthrough came when DNA was matched, through a public database, to a person's real name. But your DNA is, in a sense, your most identifying information.
For some cases, it may not matter whether your name is attached. Facebook reportedly spoke with a hospital about exchanging anonymized data. They didn't need names because they had enough information, and good enough algorithms, that they thought they could identify individuals based on everything else. (Facebook doesn't currently collect DNA information, thank god. There is a public DNA project that signs people up using a Facebook app, but they say they don't pass the data to Facebook itself.)
And remember that 2013 study about tracking down people's surnames? They grabbed whole-genome data from a few high-profile people who had made theirs public, and showed that the DNA files were sometimes enough information to track down an individual's full name and date of birth. It may be impossible for DNA to be totally anonymous.
Can You Protect Your Privacy While Using DNA Databases?
If you're very concerned about privacy, you're best off not using any of these databases. But you can't control whether your relatives use them, and you may be looking for a long-lost family member and thus want to be in a database while minimising the risks.
Here are some steps that may help preserve some of your privacy:
- Don't use your real name. This makes genealogy harder for both the cops and legit users, so you could be limiting your ability to find a relative who would recognise your side of the family by your name. But if that's not a concern, you can register for websites under a fake name or with just your initials. (Terms of service permitting, of course.)
- Create a new email address if you want people to be able to contact you. Otherwise, your fake name isn't hiding much.
- Set your data to private, or peek and then delete. This won't help if you're hoping for someone to contact you, but if you just want to see who's already out there, it can give you a snapshot. Upload your data (or opt in to the matching system, depending on your database) and take a look at who shows up. Then get out of there and cover your tracks.
- Download your raw data and delete your account. Sure, it's convenient that DNA services save your information, but you could also download your raw data as soon as it's available. Then, keep that safe and ask the company to permanently delete your account and (if possible) your actual spit sample.
- Consider which company you use. Ancestry handed over data for 31 of 34 law enforcement requests in 2017. 23andme says that they fight every request, and to date have not handed over anybody's data. I'm not saying I trust 23andme, but I know where I'd rather mail my spit.