pudl.extract.epacems
#
Retrieve data from EPA CEMS hourly zipped CSVs.
This modules pulls data from EPA’s published CSV files.
Module Contents#
Classes#
Represents EpaCems partition identifying unique resource file. |
|
Helper class to extract EpaCems resources from datastore. |
Functions#
|
Coordinate the extraction of EPA CEMS hourly DataFrames. |
Attributes#
A dictionary containing EPA CEMS column names (keys) and replacement |
|
The set of EPA CEMS columns to ignore when reading data. |
- pudl.extract.epacems.RENAME_DICT[source]#
A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).
- Type:
- pudl.extract.epacems.IGNORE_COLS[source]#
The set of EPA CEMS columns to ignore when reading data.
- Type:
- class pudl.extract.epacems.EpaCemsPartition[source]#
Bases:
NamedTuple
Represents EpaCems partition identifying unique resource file.
- get_monthly_file(month: int) pathlib.Path [source]#
Returns the filename (without suffix) that contains the monthly data.
- class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]#
Helper class to extract EpaCems resources from datastore.
EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.
- get_data_frame(partition: EpaCemsPartition) pandas.DataFrame [source]#
Constructs dataframe holding data for a given (year, state) partition.
- _csv_to_dataframe(csv_file) pandas.DataFrame [source]#
Convert a CEMS csv file into a
pandas.DataFrame
.- Parameters:
csv (file-like object) – data to be read
- Returns:
A DataFrame containing the contents of the CSV file.
- pudl.extract.epacems.extract(year: int, state: str, ds: pudl.workspace.datastore.Datastore)[source]#
Coordinate the extraction of EPA CEMS hourly DataFrames.
- Parameters:
year – report year of the data to extract
ds – Initialized datastore
- Yields:
pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.