EPA Hourly Continuous Emission Monitoring System (CEMS)#

Source URL


Source Description

US EPA hourly Continuous Emissions Monitoring System (CEMS) data.Hourly CO2, SO2, NOx emissions and gross load.


Coal and high-sulfur fueled plants

Records Liberated

~800 million

Source Format

Comma Separated Value (.csv)

Source Years


Download Size

8990 MB

Years Liberated





Open EPA Hourly Continuous Emission Monitoring System (CEMS) issues

PUDL Database Tables#

Clicking on the links will show you a description of the table as well as the names and descriptions of each of its fields. You can access the data via the EPACEMS Intake catalog.

Data Dictionary

Browse Online


Available via PUDL Data Catalog


As depicted by the EPA, Continuous Emissions Monitoring Systems (CEMS) are the “total equipment necessary for the determination of a gas or particulate matter concentration or emission rate.” They are used to determine compliance with EPA emissions standards and are therefore associated with a given “smokestack” and are categorized in the raw data by a corresponding unitid. Because point sources of pollution are not alway correlated on a one-to-one basis with generation units, the CEMS unitid serves as its own unique grouping. The EPA in collaboration with the EIA has developed a crosswalk table that maps the EPA’s unitid onto EIA’s boiler_id, generator_id, and plant_id_eia. This file has been integrated into the SQL database.

The EPA Clean Air Markets Division (CAMD) has collected emissions data from CEMS units stretching back to 1995. Among the data included in CEMS are hourly SO2, CO2, NOx emission and gross load.

Download the following files for further context:

How much of the data is accessible through PUDL?#

All of it! Data is currently available via the PUDL Data Catalog.

Thanks to Karl Dunkle Werner for contributing much of the EPA CEMS Hourly ETL code!

Who is required to install CEMS and report to EPA?#

Part 75 of the Federal Code of Regulations (FRC), the backbone of the Clean Air Act Title IV and Acid Rain Program, requires coal and other solid-combusting units (see §72.2) to install and use CEMS (see §75.2, §72.6). Certain low-sulfur fueled gas and oil units (see §72.2) may seek exemption or alternative means of monitoring their emissions if desired (see §§75.23, §§75.48, §§75.66). Once CEMS are installed, Part 75 requires hourly data recording, including during startup, shutdown, and instances of malfunction as well as quarterly data reporting to the EPA. The regulation further details the protocol for missing data calculations and backup monitoring for instances of CEMS failure (see §§75,31-37).

A plain English explanation of the requirements of Part 75 is available in section 2.0 Overview of Part 75 Monitoring Requirements

What does the original data look like?#

EPA CAMD publishes the CEMS data in an online data portal . The files are available in a prepackaged format, accessible via a user interface or FTP site with each downloadable zip file encompassing a year of data.

Notable Irregularities#

CEMS is by far the largest dataset in PUDL at the moment with hourly records for thousands of plants spanning decades. Note that the ETL process can easily take all day for the full dataset. PUDL also provides a script that converts the raw EPA CEMS data into Apache Parquet files that can be read and queried very efficiently with Dask. Check out the EPA CEMS example notebook in our pudl-examples repository on GitHub for pointers on how to access this big dataset efficiently using dask.

PUDL Data Transformations#

The PUDL transformation process cleans the input data so that it is adjusted for uniformity, corrected for errors, and ready for bulk programmatic use.

To see the transformations applied to the data in each table, you can read the doc-strings for pudl.transform.epacems created for each tables’ respective transform function.