pudl.extract.epacems
Retrieve data from EPA CEMS hourly zipped CSVs.
This modules pulls data from EPA’s published CSV files.
Module Contents
Classes
Represents EpaCems partition identifying unique resource file. |
|
Helper class to extract EpaCems resources from datastore. |
Functions
|
Coordinate the extraction of EPA CEMS hourly DataFrames. |
Attributes
A dictionary containing EPA CEMS column names (keys) and replacement |
|
The set of EPA CEMS columns to ignore when reading data. |
- pudl.extract.epacems.RENAME_DICT[source]
A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).
- Type
- pudl.extract.epacems.IGNORE_COLS[source]
The set of EPA CEMS columns to ignore when reading data.
- Type
- class pudl.extract.epacems.EpaCemsPartition[source]
Bases:
NamedTuple
Represents EpaCems partition identifying unique resource file.
- get_monthly_file(self, month: int) pathlib.Path [source]
Returns the filename (without suffix) that contains the monthly data.
- class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]
Helper class to extract EpaCems resources from datastore.
EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.
- get_data_frame(self, partition: EpaCemsPartition) pandas.DataFrame [source]
Constructs dataframe holding data for a given (year, state) partition.
- _csv_to_dataframe(self, csv_file) pandas.DataFrame [source]
Convert a CEMS csv file into a
pandas.DataFrame
.- Parameters
csv (file-like object) – data to be read
- Returns
A DataFrame containing the contents of the CSV file.
- pudl.extract.epacems.extract(epacems_settings: pudl.settings.EpaCemsSettings, ds: pudl.workspace.datastore.Datastore)[source]
Coordinate the extraction of EPA CEMS hourly DataFrames.
- Parameters
epacems_settings – Object containing validated settings relevant to EPA CEMS. Contains the years and states to be loaded into PUDL.
ds (
Datastore
) – Initialized datastore
- Yields
pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.