This repository was archived by the owner on Sep 14, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Proof-of-concept of code to make the FracFocus chemical disclosures into a usuable database.
gwallison/FF-POC
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
README for FF-POC repository and project This CodeOcean capsule is a Proof of Concept version of code to transform the online chemical disclosure site for hydraulic fracturing, FracFocus.org, into a usable database. The code demonstrates cleaning, filtering, and curating techniques to yield organized data sets and sample analyses from a notoriously messy collection of chemical records. The sample analyses are available in the results section as jupyter notebooks and downloadable versions of the final data are also available there. For a majority of the records, the mass of the chemicals is calculated. (The FracFocus data used were downloaded June 25, 2019). To be included in final data sets, Fracking events must use water as carrier and percentages must be consistent and within tolerance. Chemicals must be identified by a match with an authoritative CAS number or be labeled proprietary. Further, portions of the raw data that are filtered out include: - fracking events with no chemical records (mostly 2011-May 2013). - fracking events with multiple entries (and no indication which entries are correct). - chemical records that are identified as redundant within the event. Finally, I clean up some of the labeling fields by consolidating multiple versions of a single category into an easily searchable name. For instance, I collapse the 80+ versions of the supplier 'Halliburton' to a single name. By removing or cleaning the difficult data from this unique data source, I produce a data set that should facilitate more in-depth analyses of chemical use in the fracking industry. ****** Version explanation ****** Version 5: adjusted the formatting in a number of the figures in the jupyter notebooks to better display x and y axes and, especially, improve the log-based displays. Some text in those notebooks was changed to reflect the changes in the figures. Version 4: added a jupyter notebook that finds the overlap between the filtered FF dataset and the TEDX endocrine disruptor list. That generated list is deposited in the results section as well as html of the notebook. Version 3: corrected a mislabeled figure (the first one) in the Summary_of_cleaned_FF_data notebook.
About
Proof-of-concept of code to make the FracFocus chemical disclosures into a usuable database.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published