Now, researchers from Harvard University have demonstrated that this veneer of anonymity is easily breached. By comparing demographic data from 579 PGP profiles containing zip codes, full dates of birth, and genders with information from voter lists and other public records, and identifying patient names in the files they had uploaded to the PGP website, the researchers identified 241 participants. Checking the results with administrators at the PGP, the team found that 84 percent of these matches were correct, demonstrating that PGP profiles are vulnerable to re-identification.
This could be harmful because many participants reveal sensitive personal details, argued the authors of the study, such as predispositions to genetic diseases that might affect life insurance premiums and claims. The 2008 Genetic Information Non-Discrimination Act does covers medical, but not life insurance.
The researchers added that privacy protection could easily be firmed up with little impact on research value if PGP participants included less precise birth date and ZIP code information. They have also developed an editing tool to help people make such changes to their PGP profiles, which cannot otherwise be modified.
Clarification (May 3): The text has been amended to more accurately reflect that a portion of the 241 participants “re-identified” were found using names included in the files they had uploaded to the PGP website. As Jane Yakowitz Bambauer, associate professor of law at the University of Arizona, pointed out on the Info/Law blog, 115 of the 241 were “re-identified” in this way, and 80 of those 115 could not have been found using their demographic data alone. Thus, using demographic data alone, the researchers could only have re-identified 161 of the 579 participants.
RSS