Other Data in PUDL

This page describes minor datasets that are included in PUDL, datasets from which we’ve only integrated a small portion of the available data, or datasets that are included with little to no processing, and thus don’t yet have their own dedicated page under Data Sources. Or just data sources for which we haven’t yet compiled a complete description.

Census DP1

The US Census Demographic Profile 1 (DP1) provides Census tract, county, and state-level demographic information, along with the geometries defining those areas. We use this information in generating historical utility and balancing authority service territories based on FERC 714 and EIA 861 data. Currently, we are distributing the Census DP1 data as a standalone SQLite DB which is converted directly from the original geodatabase distributed by the US Census Bureau.

EPA CAMD to EIA Power Sector Data Crosswalk

The original EPA CAMD to EIA crosswalk was published by the US Environmental Protection Agency on GitHub and connects EPA CAMD emissions units (smokestacks) which appear in EPA Hourly Continuous Emission Monitoring System (CEMS) with corresponding EIA plant components reported in EIA Forms 860 and 923 (plant_id_eia, boiler_id, generator_id). This many-to-many connection is necessary because pollutants from various plant parts are collecitvely emitted and measured from one point-source.

The original crosswalk was generated using only 2018 data. However, there is useful information in all years of data, and we augment the crosswalk that they publish on GitHub by running their code against all available later years of data.

Re-running the crosswalk pulls the latest data from the CAMD FACT API which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk. The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level (other than identification of new plant-plant matches). We derive sub-plant IDs (subplant_id) from the crosswalk in the table core_epa__assn_eia_epacamd_subplant_ids. Note that these IDs are not necessarily stable across multiple releases of this data, and should not be hard-coded into analyses.

EIA Annual Energy Outlook (AEO)

The EIA’s Annual Energy Outlook underwent a major overhaul in 2024, but we’ve integrated a few key tables from the earlier data. These are just a small subset of the dozens of tables that have historically been part of the AEO. Look for eiaaeo in the table name to find this data.

NREL Annual Technology Baseline (ATB)

NREL publishes Annual Technology Baseline (ATB) data for the Electricity and Transportation sectors. We have integrated the Electricity sector data into the PUDL DB, but haven’t yet fully documented the data source. Look for nrelatb in the table name.

FERC DBF & XBRL Data

FERC publishes Forms 1, 2, 6, and 60 data as VisualFoxPro DBF files (2020 and earlier) and XBRL documents (2021 and later). We distribute these data as standalone SQLite database files which contain all the data from the original FERC filings, but converted to a more easily accessible format. Only a few dozen of the highest priority FERC Form 1 tables have been integrated into the main PUDL database. See the Data Access page for detailed instructions.

FERC Form 2

FERC Form 2 is analogous to FERC Form 1, but reports on the finances of gas, rather than electric utilities. Unfortunately because FERC’s jurisdiction over gas utilities is more limited than for electricity, Form 2 mostly describes interstate gas transmission pipeline companies, and not local gas distribution utilities.

FERC Form 6

FERC Form 6 (Annual Report of Oil Pipeline Companies) is a comprehensive financial and operating report submitted for oil pipelines rate regulation and financial audits.

FERC Form 60

FERC Form 60 (Annual Report of Centralized Service Companies) is a comprehensive financial and operating report submitted for centralized service companies. These are utility subsidaries that provide services to more than one type of utility (electric, gas, or oil pipeline) such that they don’t fit into any of the above Forms 1, 2, or 6.