Hello, welcome to the sixth edition of the Data Liberation Project’s newsletter. Inside: Hazmat incident data liberated, improvements to prior releases, the latest batch of FOIAs, the DLP’s new email discussion list, and other news from the data-FOIA-sphere.
Not long after the previous DLP Dispatch, the Data Liberation Project released another dataset, this one with 600,000+ hazmat incident reports submitted to the US government from the 1970s through the near-present.
Federal law requires the reporting of spills, explosions, and other safety-endangering incidents involving hazardous materials to the Department of Transportation’s Pipeline and Hazardous Materials Safety Administration (PHMSA).
The reports include details about the location of the incident, mode of transportation, parties involved, hazardous materials involved, causes of failure, fatalities, injuries, financial cost, and more.
PHSMA publishes the submitted reports through an online portal. But the portal is brittle and doesn’t provide a straightforward way to download the full set of submitted reports.
So our repository automates the downloading — and updating — of all data available through the portal.
We’re providing bulk, downloadable data in the form of monthly CSV files. And, thanks to recent efforts by Data Liberation Project volunteer Michael Nolan, you can also keep track of newly-available reports via RSS feeds for the nation and for each state.
The DLP’s previously-released datasets also continue to get better(!):
We’ve continued to make the EPA Risk Management Program data more useful and easier to understand. The simplified spreadsheets now include facilities’ industry (NAICS) codes and de-registration dates. The spreadsheets and documentation also now reflect our improved understanding of the database’s various latitude/longitude coordinates, thanks to an email response from EPA.
We’ve expanded the data we’re extracting from Animal Welfare Act inspection PDFs, thanks to efforts by DLP volunteer Gustav Cappaert. Report dates (as distinct from inspection dates) are now in our extracted dataset, and the full text and headings of the reports are on their way. Ben Welsh, co-maintainer of the data pipeline, also added an RSS feed of inspections with “critical” violations, and has been highlighting notable entries.
You can read more about each request via the links above. If you have any questions about them, please do ask.
A few people have asked about communicating with fellow Data Liberation Project contributors, aspiring contributors, and other members of the DLP community. I think it’s a great idea, and have created an email discussion list (hosted by Google Groups) to try facilitating those conversations. Click here to join.
Earlier this year, the Second Circuit published an opinion very helpful for FOIAs seeking data.
Databases often, by necessity, use unique identifiers to cross-reference information in one table with information in another. The particular identifiers don’t matter so much as their consistency across data tables. They could be, and often are, just random (or auto-incremented) numbers and/or letters.
But sometimes database maintainers use IDs that, themselves, carry some meaning. Such is the situation with Immigration and Customs Enforcement’s Enforcement Integrated Database (EID), which connects some tables based on an immigrants’ “A-Numbers” — identifiers that ICE considers to be personally-identifiable information and, thus, exempt from disclosure.
In 2018, the American Civil Liberties Union filed a FOIA request to ICE seeking certain slices of the EID. They asked that ICE replace the A-Numbers with anonymized unique IDs, which the agency declined to do. In court, ICE argued that doing so would constitute the creation of new records, and thus was beyond the scope of their FOIA obligations.
The Second Circuit opinion rejects ICE’s argument: “A government agency cannot make an exempt record [...] the sole ‘key’ or ‘code’ necessary to access nonexempt records in a particular manner; itself use the exempt record to obtain non-exempt records in that manner; and then invoke the record’s exempt status to deny the public similar access to the nonexempt records.”
Rather, per the opinion, “FOIA’s broad disclosure policy obligates the agency to substitute a different code in order to afford the public non-exempt records in the same manner as they are available to the agency.”
(Thanks to Shawn Musgrave for bringing the opinion to my attention.)