200 million registered voters exposed due to open AWS repository
A misconfigured database containing sensitive personal information of 198 million American voters was left exposed to the internet for 12 days by a Republican data analysis firm, the largest known data exposure of its kind.
According to UpGuard Cyber Risk Analyst Chris Vickery, republican contractors Deep Root Analytics, TargetPoint consulting, Inc. and Data Trust stored the data on a public cloud owned by Deep Root Analytics.
The names, dates of birth, home addresses, phone numbers, and voter registration details of nearly all of America’s registered voters were exposed, including “modeled” data of voter ethnicities and religions.
The enormous amount of political data, compiled by the RNC and contracting firms after Mitt Romney’s loss in the 2012 presidential election, held around 9.5 billion data points of three out of five americans, grading the 198 million registered voters on political leanings across forty-eight categories using algorithmic modeling.
Vickery discovered the Amazon Web Services S3 bucket repository on June 12 while searching for misconfigured data sources on behalf of UpGuard’s Cyber Risk Team. The repository had no protections, and anybody with an internet connection could have accessed it by entering the Amazon subdomain “dra-dw.”
The subdomain stood for Deep Root Analysis-Data Warehouse, as Deep Root Analytics confirmed ownership to UpGuard and subsequently secured the bucket on June 14. The warehouse bucket contained file directories of multiple Republican political organizations within the 1.1 terabytes of unsecured data and an additional 24 terabytes of secure data.
Among the publicly inaccessible files was a document likely referencing George W. Bush and Karl Rove’s SuperPAC, American Crossroads, titled “for_strategy_xroads_updated_FINAL” and a large stockpile of reddit posts saved as text.
The warehouse had two folders that stood out to Vickery and the Cyber Risk team. The first, data_trust, a folder named in clear reference to the GOP’s private sector data analysis firm created in 2011 and dubbed the “GOP’s exclusive data provider”, contained a 256 gigabyte folder for the 2008 election and a 233 GB folder for the 2012 election, with each folder containing fifty-one files, one for each state and the District of Columbia.
“Each file, formatted as a comma separated value (.csv), lists an internal, 32-character alphanumeric “RNC ID”—such as, for example, 530C2598-6EF4-4A56-9A7X-2FCA466FX2E2—used to uniquely identify every potential voter in the database.” UpGuard journalist Dan O’ Sullivan wrote.
The RNC ID’s link different data sets together, tying together an incredible amount of personally identifiable data to effectively create a politically-motivated profile of virtually every American voter. Vickery and O’Sullivan looked themselves up in this database and found the profile gleaned from this data to be accurate. Some of the .csv categories in the folder are as follows:
The files list the full names, as well as the “voter’s date of birth, home and mailing addresses, phone number, registered party, self-reported racial demographic, voter registration status, and even whether they are on the federal ‘Do Not Call’ list,” according to the UpGuard story, not to mention contentious data like “modeled ethnicity” and “modeled religion.”
According to Vickery, if the information in the aforementioned categories was available to Data Trust, it appeared to be included. A third folder under the data_trust folder was for the 2016 election, but only contained .csv files for Ohio and Florida, two of the more important swing states in the nation.
The second folder, named target_point, was possibly more damaging to American voters than the information found in data_trust. Target_point is a clear container of data compiled by TargetPoint Consulting, another Republican contractor hired to develop a voter repository and paid $4.2 million by the RNC.
TargetPoint Consulting is the principal of Needle Drop, a subsidiary of Deep Root Analytics that was created to work with RNC.
The target_point folder contained 14 .xydb files, or Alteryx Database format, a format designed for big data analysis. The majority of the files were last updated, according to Vickery, around mid-to-late January 2017, and several files titled “Contact File” included dates signifying their last update.
The “Contact Files” had the same RNC ID’s as the data_trust folder, but also included attached names and addresses of all 198 million voters. The ease with which the unprecedented scope of personal information could be downloaded and utilized is both frightening and telling of the precaution necessary when handling personally identifiable data.
Other files likely contained post-election data analytics, with files such as “DRA Post Elect 2016 Reluctant DJT scores 1-6-17.yxdb” containing 69 million rows of data. The conclusion that the analysis was a product of the RNC data team is supported by similar announcements of microtargeting in the past.