Newly Declassified Files Detail Massive FBI Data-Mining Project
(WIRED) A fast-growing FBI data-mining system billed as a tool for hunting terrorists is being used in hacker and domestic criminal investigations, and now contains tens of thousands of records from private corporate databases, including car-rental companies, large hotel chains and at least one national department store, declassified documents obtained by Wired.com show.
Headquartered in Crystal City, Virginia, just outside Washington, the FBI’s National Security Branch Analysis Center (NSAC) maintains a hodgepodge of data sets packed with more than 1.5 billion government and private-sector records about citizens and foreigners, the documents show, bringing the government closer than ever to implementing the “Total Information Awareness” system first dreamed up by the Pentagon in the days following the Sept. 11 attacks.
Such a system, if successful, would correlate data from scores of different sources to automatically identify terrorists and other threats before they could strike. The FBI is seeking to quadruple the known staff of the program.
But the proposal has long been criticized by privacy groups as ineffective and invasive. Critics say the new documents show that the government is proceeding with the plan in private, and without sufficient oversight.
“We have a situation where the government is spending fairly large sums of money to use an unproven technology that has a possibility of false positives that would subject innocent Americans to unnecessary scrutiny and impinge on their freedom,” said Kurt Opsahl, a lawyer with the Electronic Frontier Foundation. “Before the NSAC expands its mission, there must be strict oversight from Congress and the public.”
The FBI declined to comment on the program.
Among the data in its archive, the NSAC houses more than 55,000 entries on customers of the Cendant Hotel chain, now known as Wyndham Worldwide, which includes Ramada Inn, Days Inn, Super 8, Howard Johnson and Hawthorn Suites. The entries are for hotel customers whose names matched those on a long list the FBI provided to the company.
The FBI’s Data-Mining Ore
Composed of government information, commercial databases and records acquired in criminal and terrorism probes, the FBI’s National Security Branch Analysis Center is too broad to be considered mission-focused, but still too patchy to be Orwellian. Here’s the data we know about.
• International travel records of citizens and foreigners
• Financial forms filed with the Treasury by banks and casinos
• 55,000 entries on customers of Wyndham Worldwide, which includes Ramada Inn, Days Inn, Super 8, Howard Johnson and Hawthorn Suites
• 730 records from rental-car company Avis
• 165 credit card transaction histories from Sears
• Nearly 200 million records transferred from private data brokers such Accurint, Acxiom and Choicepoint
• A reverse White Pages with 696 million names and addresses tied to U.S. phone numbers
• Log data on all calls made by federal prison inmates
• A list of all active pilots
• 500,000 names of suspected terrorists from the Unified Terrorist Watch List
• Nearly 3 million records on people cleared to drive hazardous materials on the nation’s highways
• Telephone records and wiretapped conversations captured by FBI investigations
• 17,000 traveler itineraries from the Airlines Reporting Corporation
Another 730 records come from the rental car company Avis, which used to be owned by Cendant. Those records were derived from a one-time search of Avis’s database against the State Department’s old terrorist watch list. An additional 165 entries are credit card transaction histories from the Sears department store chain. Like much of the data used by NSAC, the records were likely retained at the conclusion of an investigation, and added to NSAC for future data mining.
It’s unclear how the FBI got the records. In the past, companies have been known to voluntarily hand over customer data to government data-mining experiments — notably, in 2002, JetBlue secretly provided a Pentagon contractor with 5 million passenger itineraries, for which it later apologized. But the FBI also has broad authority to demand records under the Patriot Act, using so-called “national security letters” — a kind of self-issued subpoena that’s led to repeated abuses being uncovered by the Justice Department’s inspector general.
Wyndham Worldwide did not respond to repeated requests for comment. Sears declined comment.
Wired.com’s analysis of more than 800 pages of documents obtained under our Freedom of Information Act request show the FBI has been continuously expanding the NSAC system and its goals since 2004. By 2008, NSAC comprised 103 full-time employees and contractors, and the FBI was seeking budget approval for another 71 employees, plus more than $8 million for outside contractors to help analyze its growing pool of private and public data.
A long-term planning document from the same year shows the bureau ultimately wants to expand the center to 439 people.
As described in the documents, the system is both a meta-search engine — querying many data sources at once — and a tool that performs pattern and link analysis. The NSAC is an analytic Swiss army knife.
The FBI used the system to locate a suspected Al Qaeda operative with expertise in biological agents who was hiding out in Houston. And when law enforcement officials got information suggesting members of a Pakistani terrorist group had obtained jobs as Philadelphia taxi drivers, the NSAC was tapped to help the city’s police force run background checks on Philadelphia cabbies.
(A Jordanian-born Philly cab driver was convicted in 2008 for his part in a plot to attack the Fort Dix army base in New Jersey, but there’s no evidence of a connection between the investigations.)
And when the FBI lost track of terrorism suspects swept in the evacuation from Hurricane Katrina in 2005, it created a standing order in the system to flag any activity by the missing targets.
Additionally, the FBI shared NSAC data with the Pentagon’s controversial Counter-Intelligence Field Activity office, a secretive domestic-spying unit which collected data on peace groups, including the Quakers, until it was shut down in 2008. But the FBI told lawmakers it would be careful in its interactions with that group.
Conventional criminal cases have also benefited. In a 2004 case against a telemarketing company called Gecko Communications, NSAC used its batch-searching capability to provide prosecutors with detailed information on 192,000 alleged victims of a credit scam.
The feds suspected that Gecko had promised to help the victims improve their credit scores, and then failed to produce results. NSAC automatically analyzed the victims’ credit records to prove their scores hadn’t improved, a task that took two days instead of the four-and-a-half years that the U.S. Attorney’s Office had expected to sink into the job. In December 2006, the owners and seven office managers at the company were sentenced to prison.
The NSAC was born as two separate systems designed to improve information-sharing between government agencies following the Sept. 11 attacks. The Foreign Terrorist Tracking Task Force database has been used to screen flight-school candidates and assist anti-terror investigations. The Investigative Data Warehouse is the more general system, and is the principal element now under expansion.
“The IDW objective was to create a data warehouse that uses certain data elements to provide a single-access repository for information related to issues beyond counterterrorism to include counterintelligence, criminal and cyber investigations,” stated a formerly secret fiscal year 2008 budget request document. “These missions will be refined and expanded as these capabilities are folded into the NSAC.”
When the bureau unified the systems under the NSAC banner in 2007, the move set off alarm bells with lawmakers, who thought it sounded a lot like the Pentagon’s widely-criticized Total Information Awareness project, which had sought to identify terrorist sleeper cells by linking up and searching through U.S. credit card, health and communication databases. The TIA program had moved into the shadows of the intelligence world after Congress voted to revoke most of its funding.
In 2007, Republican congressman James Sensenbrenner asked the Government Accountability Office to look into the NSAC. No report has been made public yet. But the documents obtained by Wired.com show that the FBI has repeatedly downplayed the databases’s capabilities when addressing critics in Congress, while simultaneously talking up — in budget documents — the system’s power to spit out the names of newly suspicious persons.
The FBI deflected criticism from a House committee on June 29, 2007, by pointing out a major difference between the NSAC and the shuttered TIA program: The NSAC, the bureau said, is not as open-ended. “A mission is usually begun with a list of names or personal identifiers that have arisen during a threat assessment, preliminary or full investigation,” the unsigned response read. “Those people under investigation are then assessed to determine if they have any association with terrorism or foreign espionage.”
But a formerly secret 2008 funding justification document among the newly released documents suggests the FBI’s pre-crime intentions are much wider that the bureau acknowledged.
The NSAC will also pursue “pattern analysis” as part of its service to the [National Security Branch]. Pattern analysis queries take a predictive model or pattern of behavior and search for that pattern in data sets. The FBI’s efforts to define predictive models … should improve efforts to identify “sleeper cells.”
As an example, the FBI said its sophisticated data queries allowed it to identify 165 licensed helicopter pilots who came from countries of interest, and found that six of those had “derogatory” information about them in the NSAC computers. It sent the leads to FBI field agents in Los Angeles.
The FBI also has ambitious plans to expand its data set, the budget request shows. Among the items on its wish list is the database of the Airlines Reporting Corporation — a company that runs a backend system for travel agencies and airlines. A complete database would include billions of American’s itineraries, as well as the information they give to travel agencies, such as date of birth, credit card numbers, names of friends and family, e-mail addresses, meal preferences and health information.
So far, the company has given the FBI nearly 17,000 records, which are now part of NSAC. Spokesman Allan Mutén said the company gives the FBI records only when presented with a subpoena or a national security letter — which, he adds, has happened quite a bit. “Nine-eleven was a time and event that piqued the interest of the authorities in airline passenger data,” Mutén said.
The ever-growing size of the database concerns EFF’s Opsahl, who has pieced together the best picture of the FBI’s data mining system through other government FOIA requests.
Opsahl cites a October 2008 National Research Council paper that concluded that data mining is a dangerous and ineffective way to identify potential terrorists, which will inevitably generate false positives that subject innocent citizens to invasive scrutiny by their government.
At the same time, Opsahl admits the NSAC is not at the moment the Orwellian system that TIA would have been.
“This is too massive to be based on a particular query, but too narrow to reflect a policy that they are going to out and collect this kind of data systematically,” Opsahl said.
That could change if the FBI gets it hands on the data sources on its 2008 wish list. That list includes airline manifests sent to the Department of Homeland Security, the national Social Security number database, and the Postal Service’s change-of-address database. There are also 24 additional databases the FBI is seeking, but those names were blacked out in the released data.
Graphic: Wired.com/Dennis Crothers