pudl.extract.epacems

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

Module Contents

Classes

EpaCemsPartition

Represents EpaCems partition identifying unique resource file.

EpaCemsDatastore

Helper class to extract EpaCems resources from datastore.

Functions

extract(epacems_years, states, ds: pudl.workspace.datastore.Datastore)

Coordinate the extraction of EPA CEMS hourly DataFrames.

Attributes

logger

RENAME_DICT

A dictionary containing EPA CEMS column names (keys) and replacement

IGNORE_COLS

The set of EPA CEMS columns to ignore when reading data.

pudl.extract.epacems.logger[source]
pudl.extract.epacems.RENAME_DICT[source]

A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).

Type

dict

pudl.extract.epacems.IGNORE_COLS[source]

The set of EPA CEMS columns to ignore when reading data.

Type

set

class pudl.extract.epacems.EpaCemsPartition[source]

Bases: NamedTuple

Represents EpaCems partition identifying unique resource file.

year :str[source]
state :str[source]
get_key(self)[source]

Returns hashable key for use with EpaCemsDatastore.

get_filters(self)[source]

Returns filters for retrieving given partition resource from Datastore.

get_monthly_file(self, month: int) pathlib.Path[source]

Returns the filename (without suffix) that contains the monthly data.

class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]

Helper class to extract EpaCems resources from datastore.

EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.

get_data_frame(self, partition: EpaCemsPartition) pandas.DataFrame[source]

Constructs dataframe holding data for a given (year, state) partition.

_csv_to_dataframe(self, csv_file) pandas.DataFrame[source]

Convert a CEMS csv file into a pandas.DataFrame.

Note that some columns are not read. See pudl.constants.epacems_columns_to_ignore. Data types for the columns are specified in pudl.constants.epacems_csv_dtypes and names of the output columns are set by pudl.constants.epacems_rename_dict.

Parameters

csv (file-like object) – data to be read

Returns

A DataFrame containing the contents of the CSV file.

Return type

pandas.DataFrame

pudl.extract.epacems.extract(epacems_years, states, ds: pudl.workspace.datastore.Datastore)[source]

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters
  • epacems_years (list) – The years of CEMS data to extract, as 4-digit integers.

  • states (list) – The states whose CEMS data we want to extract, indicated by 2-letter US state codes.

  • ds (Datastore) – Initialized datastore

Yields

pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.