pudl.extract.epacems

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

Module Contents

Classes

EpaCemsPartition

Represents EpaCems partition identifying unique resource file.

EpaCemsDatastore

Helper class to extract EpaCems resources from datastore.

Functions

extract(epacems_settings: pudl.settings.EpaCemsSettings, ds: pudl.workspace.datastore.Datastore)

Coordinate the extraction of EPA CEMS hourly DataFrames.

Attributes

logger

RENAME_DICT

A dictionary containing EPA CEMS column names (keys) and replacement

IGNORE_COLS

The set of EPA CEMS columns to ignore when reading data.

pudl.extract.epacems.logger[source]
pudl.extract.epacems.RENAME_DICT[source]

A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).

Type

dict

pudl.extract.epacems.IGNORE_COLS[source]

The set of EPA CEMS columns to ignore when reading data.

Type

set

class pudl.extract.epacems.EpaCemsPartition[source]

Bases: NamedTuple

Represents EpaCems partition identifying unique resource file.

year :str[source]
state :str[source]
get_key(self)[source]

Returns hashable key for use with EpaCemsDatastore.

get_filters(self)[source]

Returns filters for retrieving given partition resource from Datastore.

get_monthly_file(self, month: int) pathlib.Path[source]

Returns the filename (without suffix) that contains the monthly data.

class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]

Helper class to extract EpaCems resources from datastore.

EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.

get_data_frame(self, partition: EpaCemsPartition) pandas.DataFrame[source]

Constructs dataframe holding data for a given (year, state) partition.

_csv_to_dataframe(self, csv_file) pandas.DataFrame[source]

Convert a CEMS csv file into a pandas.DataFrame.

Parameters

csv (file-like object) – data to be read

Returns

A DataFrame containing the contents of the CSV file.

pudl.extract.epacems.extract(epacems_settings: pudl.settings.EpaCemsSettings, ds: pudl.workspace.datastore.Datastore)[source]

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters
  • epacems_settings – Object containing validated settings relevant to EPA CEMS. Contains the years and states to be loaded into PUDL.

  • ds (Datastore) – Initialized datastore

Yields

pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.