Report: Government data mining has gone too far – and AI will make it worse
Federal agencies often collect voluminous amounts of data on Americans to fulfill their missions and better understand the public’s needs.
But a new whitepaper from the Electronic Privacy Information Center argues that increasingly sophisticated and invasive data mining is now widespread throughout government, allowing machines — and not humans — to determine how data is connected and used to draw inferences about people, government policies and programs.
The collection of data on Americans — and the use of software to analyze and connect this information for policymaking — are a “constitutional minefield, rife with privacy disasters and standing invitations for government overstep and abuse,” wrote Abigail Kunkler, a law fellow at EPIC and author of the whitepaper.
The risk is particularly high when such tools are used to “predict” criminal or illegal behavior — a practice that she argued posed First, Fourth and Fifth Amendment risks. These predictions are unreliable, she claims, given the technological limitations, lack of meaningful data signals to track, and human biases that can turn these programs into weapons of oppression.
“For decades, scholars and advocates from all sides have argued that data mining is ineffective to combat illegal activity simply because the data is not there to mine,” Kunkler wrote. “Successful data mining in this context requires a high number of known instances of a particular behavior—in the millions at least—before a pattern can emerge.”
Kunkler argues these worries are more than hypothetical, making direct links between the dangers of aggressive government data mining and the Trump administration’s efforts to merge federal datasets from various agencies to create national databases on voters, U.S. citizens and immigration enforcement.
Further, she notes that the emergence of AI is going to kick these practices into overdrive, leading to agencies making spurious or misleading connections based on a technology that has not proven itself to be reliable.
“Armed with AI, data mining capabilities have escalated data collection, retention, and analysis at an unbelievable pace,” Kunkler wrote. “And alarmingly, the ghost of Total Information Awareness has been revived in the federal government’s reported plans to construct a massive and centralized repository of personal data, which it intends to mine as part of the Administration’s ruthless anti-immigrant and antidemocratic campaign.”
Reforming data mining laws (or passing new ones)
Kunkler argues in favor of legislative reforms to the Federal Agency Data Mining Reporting Act, a 2007 law that was originally intended to limit government data mining and require agencies to publicly disclose the types of data analysis they conduct.
The law currently has no enforcement authority, so agencies face no consequences for failing to publish reports or for providing insufficient information. Agencies that do transmit such reports to Congress confidentially do not have to publicly report, making it harder to grasp the full extent of government data mining operations. Critically, the law only requires reporting for data mining programs that involve “pattern-based queries, searches, or other analyses” looking for predictive patterns or anomalies. Data mining that starts with “an initiating individual” and looks for potential connections or associations does not require public disclosure.
“The distinction allows for invasive searching without particularized suspicion,” Kunkler wrote. “The combination of AI-powered data mining and shrunken costs associated with data collection supercharges the government’s ability to use the ‘surveillance time machine’ and assemble digital dossiers on any given person at any time in their lives.”
Christopher Marcum, a former assistant director for open science and data policy at the White House Office of Science and Technology Policy during the Biden administration, told CyberScoop that while he shared concerns about potential government overreach around data mining, he believed it will take more than updating a two-decade old transparency law to meaningfully curb the practice.
“I would say that the train has left the station and the [Federal Agency Data Mining Reporting Act] hasn’t had any measurable effect on protecting against mosaic effects in the face of movements towards more blended data, data linkage and of course, AI,” Marcum said.
Instead, he argued that only comprehensive reforms by Congress can stop the government’s steady expansion of massive collection and analysis. He believes there is support in Congress for stronger action, pointing to bills like the American Privacy Rights Act—even though such efforts have often stalled due to factional disagreements — that could impose broader limits on government activity.
Some members of Congress are arguing that the dangers of empowering the government to do whatever it wants with federal datasets is already a real and present threat.
Thursday, Democratic Senators Alex Padilla, D-Calif., and Dick Durbin, D-Ill., wrote to Attorney General Pam Bondi requesting the Justice Department brief the Senate Rules and Judiciary Committees about the administration’s ongoing efforts to merge myriad federal and state datasets into a tool meant to check and verify citizenship of U.S. voters.
That effort has included major technical revamps of the Systemic Alien Verification for Entitlements (SAVE) database managed within the U.S. Citizenship and Immigration Services by members of the Department of Government Efficiency, as well as the merging data culled from the Social Security Administration.
“Put simply, it is neither the Department’s job nor its skillset to micromanage how election officials purge voters from state voter rolls,” Padilla and Durbin wrote.