COVID, Data, DNA, and the Future
It is a commonplace in the post-modern world that data is “the new oil,” the road to power and wealth. The analogy isn't perfect, but there’s a great deal of truth in it. Enormous and potentially dangerous real-world consequences flow freely from Big Data, for a wide swath of individuals and communities. DNA databases, court proceedings, snapshots of license plates, and even more obviously illegal-to-access data such as medical records and voter registrations: all can be grist to the mill for advertisers, politicians, and criminals.
Technology can of course be beneficial as well as harmful. The tech industry has created many well-paying jobs but has also contributed to a remarkable concentration of money, which is producing widespread complaints that its “moguls” have too much power. As a society, we are clearly struggling with balancing the benefits of the Internet, for example, against the political trainwrecks caused by lies galloping around the world. Scientists have been gathering data about human DNA for decades; that may eventually lead to practical gene therapies and can already identify some predispositions to disease. Getting an early diagnosis for, say, breast cancer is surely a good thing. But the question of who owns your individual genomic data has been debated for years, and aggregated data sets are becoming increasingly open for both use and abuse.
Government Data and Private Profits
The handling of government data about the COVID-19 pandemic provides some examples. In May, Florida fired the person in charge of publicly reporting infection data, according to her account, “for refusing to manipulate data to support the state’s reopening.” In July, the state was accused of “underreporting its rate of positive tests.” Also in July, Georgia was credibly accused of “backdating new cases” to make the state’s current policies seem better, and thus improve the governor’s political prospects. In both states, the intent was clearly to keep power by essentially gaslighting the population with misleading statistics.
So it is hardly surprising that when the Federal government changed its data reporting process in mid-July, suspicions were raised, along with eyebrows and hackles. That information is, of course, rather general and presumably contains no information about identifiable individual patients. Dead people do have a way of talking, however, at least to their doctors, who generally report to local authorities, whose numbers fuel independent assessments, so as a coverup this process change may not be completely successful.
Most published reports about the Administration’s data reporting edict have focused on the President’s re-election prospects, which might improve if COVID cases decline, but other interests are involved too, such as profits.
As Heather Cox Richardson and others have detailed, two different private companies are involved in the new system. One is Teletracking, which experts complain has “no history with infection preventionists” and “would generally be considered risky” for the assignment. The company sells tracking systems and “custom reporting solutions” and already scored a controversial $10.2 million no-bid deal from the Department of Health and Human Services (HHS) in April, which runs till September.
Peter Thiel’s Palantir
The other company with a new contract is Palantir, co-founded by Peter Thiel, the eccentric billionaire who hopes to live forever and who is known as “the most famous Trump supporter in tech” (though he is now signaling that he may abandon the sinking ship). Palantir has a reputation for being secretive and is widely viewed as terrifying for its ability to collect and analyze data, though we should stress that the HHS data is aggregated well before being reported to the central registry. The company is getting paid at least $17 million for this work, but that’s a drop in the bucket: it has $1.5 billion in government contracts, mostly with the Defense Department.
Even more worrying is what Palantir is now doing in the UK. The company has been hired and has assigned 45 staff to a project “designed to predict surges in NHS demand during the coronavirus crisis.” For this, Palantir is being paid £1. Yes, about one dollar and 26 cents. The project is costing the company about £88,000 ($111,000) a week, estimates the New Statesman, so how are they intending to turn a profit? Well, according to CNBC:
The NHS health records that Palantir has access to can include a patient’s name, age, address, health conditions, treatments and medicines, allergies, tests, scans, X-Ray results, whether a patient smokes or drinks, and hospital admission and discharge information.
What is Palantir hoping to do with that information? The data is anonymized, of course, but that’s not much help. A 2019 paper showed that 99.98% of Americans can be identified from 15 demographic attributes. A 2015 study showed that “four spatiotemporal points [of credit card data] are enough to uniquely reidentify 90% of individuals.” No wonder a 2018 Bloomberg investigation, focused on Americans, was titled “Palantir Knows Everything About You.” You might conclude that having data on essentially the entire population of Britain would be potentially valuable. Even if you don’t yet know how.
Contact Tracing and a Symptom Tracker
In the months before this recent furor, there was much talk about using technology to perform (or at least assist) contact tracing, which is vital to slowing the spread of COVID-19. This is the usually tedious task of finding people who may have been infected by newly identified patients. Apple and Google jumped in to help, essentially by designing software that uses the Bluetooth technology in modern phones; their application was adopted by several countries but many others chose to develop their own systems.
Data security and privacy (notably with Google’s app) remain significant issues with contact tracing, and some countries are still discovering security flaws. Also, the lack of public trust continues to hinder the process: in Michigan, for example, two-thirds of the public have privacy concerns; in France an app was downloaded 2 million times but in the first three weeks only identified 14 people who might have been exposed; and in Australia the ratio is even worse. Some epidemiologists are complaining that data secrecy is “crippling attempts to slow COVID-19’s spread” but perhaps reverting to an old-fashioned pencil-and-shoe-leather approach is called for.
Big Data, AI, and DNA databases certainly can be useful in the COVID crisis. An early, peer-reviewed study published in Nature, identified loss of taste and smell as a potential symptom of COVID-19 by analyzing data from an app that had been downloaded by 2.6 million people. The Covid Symptom Tracker app now has over 4 million users, all over the US, who are voluntary participants in research. Carefully applied artificial intelligence algorithms can help doctors, without pretending to replace them. But technology can only be part of a solution, and its real-world effects must always be kept in mind.
Facial Recognition and Predictive Policing
Consider another data sector that has recently been complicated by the Black Lives Matter and related protests: facial recognition. On June 8, IBM quit the business altogether. Two days later, Amazon announced that it was “implementing a one-year moratorium on police use of Amazon’s facial recognition technology.” Microsoft rapidly followed suit, and will refrain “until we have a national law in place, grounded in human rights, that will govern this technology.” Not incidentally, Amazon Rekognition (sic) is notoriously bad at identifying people of color; in one study, for example, it misidentified black women as men 31% of the time. Microsoft and IBM had similar problems. So perhaps public and employee concern was a good excuse to draw back from this particular segment, possibly while returning to the technical drawing board.
Nevertheless, Microsoft still has many other links to law enforcement. So does Palantir, notably in the dodgy and dystopian area known as “predictive policing.” This concept is fashionable among police departments, and clearly involves artificial intelligence and data analysis, though details are somewhat scarce and the data sets are often secret. They are also often racist in application, at least partly because of pre-existing bias: As Yeshimabeit Milner, Executive Director of Data for Black Lives, told Technology Review, “There’s a long history of data being weaponized against Black communities.”
DNA Databases and GINA
When it comes to data and the police, we must not forget the uses and abuses of DNA databases, both governmental and private. Huge collections of individuals’ genetic information have been amassed by police agencies throughout the US and around the world, China being a notable and worrying example, and throughout the US, as well as by direct-to-consumer genetic testing companies such as 23andMe and AncestryDNA. Direct-to-consumer DNA databases are credited with solving a number of cases, starting with the “Golden State Killer,” and DNA analysis has exonerated hundreds of wrongfully convicted people.
But errors in DNA analysis — often contamination of samples and lab mistakes — have also harmed growing numbers of innocent people, leading to serious charges and imprisonment. And standard policing practices that jeopardize communities of color can be significantly exacerbated by “genetic surveillance.” California, for example, insists on retaining the DNA of about 750,000 people who have never been convicted of a felony, which means that they and their relatives will be subject to genetic surveillance when police search databases for suspects. In an effort to have those samples and the data derived from them expunged, the Center for Genetics and Society (CGS), the Equal Justice Society (EJS), and an individual plaintiff (me) are currently in the process of suing the state. As the San Francisco Chronicle noted in an Editorial, “the burden for fixing a problem the state created should rest with the state.”
We’ve had a couple decades to grapple with abuses of DNA data technologies, and we’ve only just begun to make headway. Now the COVID crisis is dishing up new data worries — and though they’re not as urgent as the virus itself, they’re worth attending to. All too often, and perhaps particularly under crisis conditions, the temptations of novelty and the attractions of magic tech bullets eclipse sensible questions such as: What is this for? Who benefits? Is this what society needs?
The Genetic Information Nondiscrimination Act of 2008 (GINA) is overdue for an update, which might provide an opportunity to cover more than just genetic data. GINA took more than a decade of legislative effort — the first such bill was introduced in 1995 — until it was passed almost unanimously, though there have since been efforts to weaken it. It will likely be even harder to draft legislation to ensure that all data are deployed not for profit or policing, but in the service of public health and social justice. This discussion needs to happen now.
Update July 31:
Several of the topics in this post are the subject of continuing investigative journalism.
- An NPR report, Irregularities In COVID Reporting Contract Award Process Raises New Questions, has much more detail on TeleTracking, its links to the Trump Organization and the changes in the data being tracked.
- Undark published a deep dive into racial bias, Artificial Intelligence, Health Disparities, and Covid-19, with some indications that there could be a developing conversation on the subject.
- Wired covered issues with donor-conceived pregnancies in There’s No Such Thing as Family Secrets in the Age of 23andMe, citing among others Wendy Kramer, founder of Donor Sibling Registry, and Australian legal expert and international advocate, Sonia Allen.