Data Dictionary

init_notebook_modetrusted

No description has been provided for this image

Links: Navigator Page | Chemical Index | State Index | Operator Index

Open-FF

Open-FF Data Dictionary

This file was generated on July 08, 2025
from data repository: openFF_data_2025_07_07.

Description of the contents of the final data files generated by Open-FF from the FracFocus data.¶

Pulling repo tables from: G:\My Drive\production\repos\openFF_data_2025_07_07\pickles

Acceptable use of FracFocus data¶

One requirement for using the FracFocus data is stipulated on the FracFocus website:

"Downloaded data may be aggregated or combined with other datasets, but the FracFocus data may not be altered in any way."

Please read the entire "Terms of use" at http://fracfocus.org/data-download.

The work in this project maintains the original FracFocus data as is reported in the bulk download. The field names used in the original are kept: All of these original names begin with an upper-case letter and can be identified in that way. Fields generated by this project or from external data sources will begin with a lower case letter (for example, CASNumber is the original field, bgCAS is the generated field. Note there are two exceptions: DTXSID and MI_inconsistent are NOT original with FracFocus.)

In the zipped bulk download from FracFocus, a data dictionary is provided in the 'readme.txt' file. (This zipped download is in the /sources or /data directory and we rename it as 'currentData.zip') This file gives some information about many of the fields found; however, it is written for the SQL database version of the bulk download, not the CSV version which we use in this project. Further, some important fields are not mentioned in that readme.txt file; they are described below. In the descriptions of all fields below, we cite the FracFocus text from a June 2021 bulk download.

Descriptions of fields in the output data sets¶

Explanation of columns in the table below

column	what it is
fieldName:	The name of the field or column in the data set. All field names that are capitalized are from the original FracFocus downloaded data. Lower-case names are generated by Open-FF. tables: Which Open-FF internal tables that are used to construct output data sets have this field
FracFocus description:	Description of the (original) field given by FracFocus in the bulk download file, readme.txt.
Open-FF description:	Our description of the field
source:	is this field a direct copy of the original FracFocus data or is it generated by Open-FF, or pulled from an external data set?
Num:	the number of non-empty values in the field
Unique:	the number of unique types (including NaN) in the field
Data_type:	the python/pandas data type for the field

field Name, [tables]	FracFocus description	Open-FF description	source	Num	Unique	Data_type
Loading ITables v2.2.2 from the `init_notebook_mode` cell... (need help?)

field Name, [tables] FracFocus description Open-FF description source Num Unique Data_type

Carrier detection sets:¶

Among the filters below, s1 finds the majority of water carriers. However, there is no single set of criteria that can be used to identify the water carrier record(s) for all FracFocus disclosures. Therefore the other filters are employed to catch many other disclosure patterns without needing to curate each by hand.

Set name	description	Criteria to be detected
s1	Primary filter; most recent disclosures are detected with this	- Only one record whose `Purpose` is "carrier" (or related) - `bgCAS` is '7732-18-5' - at least 50% `PercentHFJob` - total % of disclosure is 95% > x > 105%
s2	More than one record as the carrier; covers situations, for example, where there are two water records (fresh and produced) and where other chemicals are also labeled as part of the carrier. It is important to include all water carrier records to avoid underestimating carrier mass	- More than one record whose `Purpose` is "carrier" (or related) - at least one `bgCAS` is '7732-18-5' - total of water records is at least 50% `PercentHFJob` - total % of disclosure is 95% > x > 105%
s3	No carrier records labeled; but clear water record with typical percentage	- `bgCAS` is '7732-18-5' - at least 40% `PercentHFJob` - `IngredientName` contains phrase "including mix water" - total % of disclosure is 95% > x > 105%
s4	Like s3, but CAS number missing; still obvious water record	- `CASNumber` is empty - at least 60% `PercentHFJob` - `IngredientName` contains phrase "including mix water" - total % of disclosure is 95% > x > 105%
s5	Like s1 but no carrier records are labeled;	- `bgCAS` is '7732-18-5' - at least 50% `PercentHFJob` - total % of disclosure is 95% > x > 105%
s6	`CASNumber` missing but clear carrier label	- `bgCAS` is ambiguousID - single record with a carrier `Purpose` - `IngredientName` is either "carrier" (or related) or has "water" in it - `TradeName` has "water" in it - 50% < %HFJob < 100% - total % of disclosure is 95% > x > 105%
s7	Like s1, but for "salted" water Note that even though the record is labeled with the salt CAS number, the predominant mass is water	- Only one record whose `Purpose` is "carrier" (or related) - `bgCAS` is either '7747-40-7' (kcl) or '7647-14-5' (nacl) - at least 50% `PercentHFJob` - total % of disclosure is 95% > x > 105%
s8	Common pattern in the older disclosures (incl. SkyTruth archive)	- `bgCAS` is ambiguousID or 7732-18-5 - `IngredientName` is MISSING - `Purpose` is "unrecorded purpose" - `TradeName` has either "water" or "brine" - can be one or two records in each disclosure - 50% < sum of `PercentHFJob` of these records < 100% - total % of disclosure is 95% > x > 105%
s9	Common pattern in the older disclosures (incl. SkyTruth archive)	- `bgCAS` is ambiguousID or 7732-18-5 - `IngredientName` is MISSING - `Purpose` is one of the standard carrier words or phrases - `TradeName` has either "water" or "brine" - can be one or two records in each disclosure - 50% < sum of `PercentHFJob` of these records < 100% - total % of disclosure is 95% > x > 105%
s10	A pattern seen in later disclosures: the carrier is only reported in the top part of the systems approach section under the "Listed Below" `CASNumber`. The actual `PercentHFJob` value isn't even reported in the PDF version, but is in the bulk download.	- `CASNumber` is "Listed Below" - record has a carrier `Purpose` - `PercentHFJob`>50 % - `TradeName` has "water" in it - total % of disclosure is 95% > x > 105%

Disclosures with detected problems for determination of water carrier ID¶