No description has been provided for this image

Links: Navigator Page | Chemical Index | State Index | Operator Index


openFF logo

Open-FF

Short Description of Open-FF


This file was generated on May 17, 2025
from data repository: openFF_data_2025_05_14.

FracTracker logo

Sponsored by FracTracker Alliance


An open-source project to transform the industry's FracFocus disclosure data into a usable resource.¶

Data sources¶

The primary source of data for the Open-FF project is FracFocus, "The national hydraulic fracturing chemical disclosure registry." These data cover most US states where fracking is active; many states require operators to use FracFocus to disclose chemical information. FracFocus began in 2011 and continues to release diclosures regularly.

Open-FF uses the bulk download for the primary data set. This set of over 6 million chemical records in over 200,000 disclosures is NOT a database. It contains several formats used by hundreds of different companies, often with poor attention to data completeness, quality, and standardization. Summarizing patterns across the raw collection is very difficult. Open-FF aims to transform this raw collection to produce data sets that improve standardization and completeness to facilitate big picture analysis of the industry's chemical use and disclosure practices.

Further, Open-FF maintains supplementary data sets that are available to interested researchers. For example, the bulk download does not include chemical records released through May 2013, even though the data are available as PDFs. These supplementary resources provide access to the data in the PDFs. Additionally, connections to external lists, such as chemicals of concern, are provided in the data sets.

Chemical Identification¶

There is no more important piece of a chemical disclosure instrument than clear identification of the chemicals used.¶

Chemical usage in FracFocus is typically reported by both chemical names (such as "methanol") and a CAS registry number ("67-56-1"). However, many FracFocus disclosures do not provide unambiguous identity in both sources:

No description has been provided for this image

Open-FF evaluates both forms of chemical identity by comparing them to authoritative references (The Chemical Abstract Service's SciFinder database and EPA's CompTox resource) to generate the "best guess" identity for as many FracFocus records as possible. In addition, Open-FF clearly labels the records in which companies explicitly obscure chemical identity - the so-called Trade Secret claims. Finally, some records are associated with the "systems approach" format.

No description has been provided for this image

Uncovering Chemical Quantity¶

After chemical identity, how much is used is of critical importance.¶

FracFocus does not explicitly provide reports of chemical quantities in their pdf files. Text on the FracFocus website dwells on the proportion of added chemicals in a typical fracking job. That proportion can be pretty small - usually less than 2% - as this figure from the FracFocus website illustrates.

No description has been provided for this image

However, the actual quantity of the added chemicals can be quite large simply because the whole job is enormous. For example, the chemical Ethylene glycol (CASRN: 107-21-1) is a material on several lists of chemicals of concern and is used in about 40% of fracking jobs.

As part of our effort to explicitly document quantity, we can calculate the mass of all the chemicals if some basic information is present in a disclosure.

The two basic disclosure values - volume of water used as the carrier and the percent of the total mass that the carrier occupies - allow Open-FF to calculate the total mass of the fracking job. From there, the mass of individual records can be calculated using the reported percent of total mass for each chemical.

This method has been used by other researchers and the FracFocus team has acknowledged that it is valid. One important caveat: the percent masses reported in FracFocus are the maximum of a range that a manufacturer reports in MSDS documents. Therefore, the masses Open-FF calculates are maximums.

No description has been provided for this image

In the graph above, each dot represents the calculated mass of Ethylene glycol from a single fracking job above 1,000 pounds.

In addition to these calculations, recent versions of Open-FF use a column in FracFocus named "MassIngredient" that is available for about half of FracFocus records; Open-FF uses this data to validate calculations described above and, in some cases, it is the only record of mass available.

Correcting and consolidating labels¶

Open-FF also aims to make searching and aggregating data more feasible by cleaning up other data fields. For example, the supplier and oilfield service company, Halliburton, is represented by more than 80 versions of the company name. Here's a sample of those versions in the Supplier field:

'Halliburton', 'HALLIBURTON', 'HES', 'Halliburton Energy Services, Inc. (HES)', 'Halliburton Energy Services, Inc', 'Hallibrton', 'Hallliburton', 'Halliburton Energy Services', 'Halliburton Energy Services, Inc.', 'HES Multi-Chem', 'Hallibruton Energy Services', 'Hallibuton', 'Halluburton', 'Halliburton, Multi-Chem', 'Halliurton', 'Halliberton', and 'HES Chemicals.'

This lack of standardization makes thorough searches across 6 million records impossible.

Open-FF employs curated translation tables to create new fields from these unstandardized fields. For instance, all of the verisons above in Supplier are assigned to the value 'halliburton' in the bgSupplier field. This allows us to produce comprehensive views of the whole data set. In the following graph, the number of records for the biggest users of Naphthalene is shown - this time for operating companies, another curated field.

No description has been provided for this image

Filtering¶

Because FracFocus has a range of data quality and Open-FF assigns flags to questionable records and disclosures, users of Open-FF products can choose to filter out lower quality data or other problem data before performing analysis. Additionally, these flags can assist researchers interested in patterns of disclosure completeness and lack of transparency.

Products¶

Open-FF is an open-source project. Through early 2023, project code was published at CodeOcean which allows any user to re-run the code, with or without modification. CodeOcean also certifies that the code is reproducible. More recently, the project is published at GitHub (openFF) and includes documentation.

  • Users can explore the data set using the Data Browser. This include tables that are interactive, allowing for easy sorting and searching. For example:
    • The chemical index provides a quick method to scan and search through the list of over 1300 chemicals used in FracFocus. This index lets users subset the material based of lists of chemicals of concern and other factors. For each item on this table, the user can view summary analysis of the chemical. In addition, from this table, a user can download a slice of the data set of just the records for this chemical.
    • The synonym index helps users identify a CAS number if they know a particular chemical name. The results in this table also point to summary analysis of each chemical.
    • The operator index sumarizes where and when a company is active, their water use through time as well as summaries of chemicals of concern and trade secret designations.
  • Open-FF is currently publishing a periodic summary of all the disclosures published since the last report.

If you are interested in these data, but these products don't quite serve your needs, please contact us. The goal of this project is to make the research of fracking chemicals more accessible and we are interested in adding features to assist analysis.