pudl.extract.epacems module

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

pudl.extract.epacems.extract(epacems_years, states, data_dir)[source]

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters
  • epacems_years (list) – The years of CEMS data to extract, as 4-digit integers.

  • states (list) – The states whose CEMS data we want to extract, indicated by 2-letter US state codes.

  • data_dir (path-like) – Path to the top directory of the PUDL datastore.

Yields

dict – a dictionary with a single EPA CEMS tabular data resource name as the key, having the form “hourly_emissions_epacems_YEAR_STATE” where YEAR is a 4 digit number and STATE is a lower case 2-letter code for a US state. The value is a pandas.DataFrame containing all the raw EPA CEMS hourly emissions data for the indicated state and year.

pudl.extract.epacems.read_cems_csv(filename)[source]

Read a CEMS CSV file, compressed or not, into a pandas.DataFrame.

Note that some columns are not read. See pudl.constants.epacems_columns_to_ignore. Data types for the columns are specified in pudl.constants.epacems_csv_dtypes and names of the output columns are set by pudl.constants.epacems_rename_dict.

Parameters

filename (str) – The name of the file to be read

Returns

A DataFrame containing the contents of the CSV file.

Return type

pandas.DataFrame