Work in Progress & Future Datasets#

Important

Looking for a specific dataset?

If you need data that’s not in PUDL, open an issue to tell us more about it!

If you’ve already spent a bunch of time wrangling a dataset, we welcome “knowledge contributions” in our pudl-knowledge repository!

If you’re looking to help us integrate a specific dataset into PUDL, find us at office hours and we can talk through next steps.

Work in Progress#

Thanks to a grant from the Alfred P. Sloan Foundation Energy & Environment Program, we have support to integrate the following new datasets between April 2021 and March 2024.

There’s a huge variety and quantity of data about the US electric utility system available to the public. The data we have integrated is just the beginning! Other data we’ve heard demand for are listed below. If you’re interested in using one of them and would like to add it to PUDL check out our contribution guidelines. If there are other datasets you think we should be looking at integration, don’t hesitate to open an issue on Github requesting the data and explaining why it would be useful.

Census DP1#

The US Census Demographic Profile 1 (DP1) provides Census tract, county, and state-level demographic information, along with the geometries defining those areas. We use this information in generating historical utility and balancing authority service territories based on FERC 714 and EIA 861 data. Currently, we are distributing the Census DP1 data as a standalone SQLite DB.

EIA Form 176#

EIA Form 176, also known as the Annual Report of Natural and Supplemental Gas Supply and Disposition, describes the origins, suppliers, and disposition of natural gas on a yearly and state by state basis.

FERC EQR#

The FERC Electric Quarterly Reports (EQR), also known as FERC Form 920, includes the details of transactions between different utilities and transactions between utilities and merchant generators. It covers ancillary services as well as energy and capacity, time and location of delivery, prices, contract length, etc. It’s one of the few public sources of information about renewable energy power purchase agreements (PPAs). This is a large (~100s of GB) dataset composed of a very large number of relatively clean CSV files, but it requires fuzzy processing to get at some of the interesting and only indirectly reported attributes.

FERC Form 2#

FERC Form 2 is analogous to FERC Form 1, but it pertains to gas rather than electric utilities. The data paint a detailed picture of the finances of natural gas utilities.

Machine Readable Clean Energy Standards#

Renewable Portfolio Standards (RPS) and Clean Energy Standards (CES) have emerged as one of the primary policy tools to decarbonize the US electricity supply. Researchers who model future electricity systems need to include these binding regulations as constraints on their models to ensure that the systems they explore are legally compliant. Unfortunately for modelers, RPS and CES regulations vary from state to state. Sometimes there are carve outs for different types of generation, and sometimes there are different requirements for different types of utilities or distributed resources. Our goal is to compile a programmatically usable database of RPS/CES policies in the US for quick and easy reference by modelers.

Future Data of Interest#

Transmission and Distribution Systems#

In order to run electricity system operations models and cost optimizations, you need some kind of model of the interconnections between generation and loads. There doesn’t appear to be a generally accepted, publicly available set of these network descriptions (yet!).

EIA Water Usage#

EIA Water records water use by thermal generating stations in the US.

MSHA Mines and Production#

The MSHA Mines & Production dataset describes coal production by mine and operating company along with statistics about labor productivity and safety. This is a smaller dataset (100s of MB) available as relatively clean and well structured CSV files.