Work in Progress & Future Datasets

Work in Progress

Thanks to a grant from the Alfred P. Sloan Foundation Energy & Environment Program, we have support to integrate the following new datasets between April 2021 and March 2023.

There’s a huge variety and quantity of data about the US electric utility system available to the public. The data we have integrated is just the beginning! Other data we’ve heard demand for are listed below. If you’re interested in using one of them and would like to add it to PUDL check out our contribution guidelines. If there are other datasets you think we should be looking at integration, don’t hesitate to open an issue on Github requesting the data and explaining why it would be useful.

Census DP1

The US Census Demographic Profile 1 (DP1) provides Census tract, county, and state-level demographic information, along with the geometries defining those areas. We use this information in generating historical utility and balancing authority service territories based on FERC 714 and EIA 861 data. Currently, we are distributing the Census DP1 data as a standalone SQLite DB.

EIA Form 861

The EIA Form 861, also known as the Annual Electric Power Industry Report, compiles information on load, generation, capacity, sales, revenues, programs, and more. Right now we’ve got all of 861 integrated and are building out our testing and data validation before publishing the data officially.

EIA Form 176

EIA Form 176, also known as the Annual Report of Natural and Supplemental Gas Supply and Disposition, describes the origins, suppliers, and disposition of natural gas on a yearly and state by state basis.

FERC Form 714

FERC Form 714 includes hourly loads reported by load balancing authorities annually. This is a modestly sized dataset, in the 100s of MB, distributed as CSV files exported from a Visual FoxPro database prior to publication. All of the raw tables are being extracted, and a couple of them have been integrated into the transform process. None are in the PUDL DB yet.


The FERC Electric Quarterly Reports (EQR), also known as FERC Form 920, includes the details of transactions between different utilities and transactions between utilities and merchant generators. It covers ancillary services as well as energy and capacity, time and location of delivery, prices, contract length, etc. It’s one of the few public sources of information about renewable energy power purchase agreements (PPAs). This is a large (~100s of GB) dataset composed of a very large number of relatively clean CSV files, but it requires fuzzy processing to get at some of the interesting and only indirectly reported attributes.

FERC Form 2

FERC Form 2 is analogous to FERC Form 1, but it pertains to gas rather than electric utilities. The data paint a detailed picture of the finances of natural gas utilities.

PHMSA Natural Gas Pipelines

The PHMSA Natural Gas Annual Report, published by the Pipeline and Hazardous Materials Safety Administration (part of the US Dept. of Transportation), collects data about natural gas gathering and transmission and distribution systems (including their age, length, diameter, materials, and carrying capacity). PHAMSA also has information about natural gas storage facilities and liquefied natural gas shipping facilities.

Machine Readable Clean Energy Standards

Renewable Portfolio Standards (RPS) and Clean Energy Standards (CES) have emerged as one of the primary policy tools to decarbonize the US electricity supply. Researchers who model future electricity systems need to include these binding regulations as constraints on their models to ensure that the systems they explore are legally compliant. Unfortunately for modelers, RPS and CES regulations vary from state to state. Sometimes there are carve outs for different types of generation, and sometimes there are different requirements for different types of utilities or distributed resources. Our goal is to compile a programmatically usable database of RPS/CES policies in the US for quick and easy reference by modelers.

Future Data of Interest

Transmission and Distribution Systems

In order to run electricity system operations models and cost optimizations, you need some kind of model of the interconnections between generation and loads. There doesn’t appear to be a generally accepted, publicly available set of these network descriptions (yet!).

EIA Water Usage

EIA Water records water use by thermal generating stations in the US.

MSHA Mines and Production

The MSHA Mines & Production dataset describes coal production by mine and operating company along with statistics about labor productivity and safety. This is a smaller dataset (100s of MB) available as relatively clean and well structured CSV files.