pudl.transform.epacems module¶
Routines specific to cleaning up EPA CEMS hourly data.
-
pudl.transform.epacems.
add_facility_id_unit_id_epa
(df)[source]¶ Harmonize columns that are added later.
The datapackage validation checks for consistent column names, and these two columns aren’t present before August 2008, so this adds them in.
- Parameters
df (pandas.DataFrame) – A CEMS dataframe
- Returns
The same DataFrame guaranteed to have int facility_id and unit_id_epa cols.
- Return type
pandas.Dataframe
-
pudl.transform.epacems.
correct_gross_load_mw
(df)[source]¶ Fix values of gross load that are wrong by orders of magnitude.
- Parameters
df (pandas.DataFrame) – A CEMS dataframe
- Returns
The same DataFrame with corrected gross load values.
- Return type
-
pudl.transform.epacems.
fix_up_dates
(df, plant_utc_offset)[source]¶ Fix the dates for the CEMS data.
- Parameters
df (pandas.DataFrame) – A CEMS hourly dataframe for one year-month-state
plant_utc_offset (pandas.DataFrame) – A dataframe of plants’ timezones
- Returns
The same data, with an op_datetime_utc column added and the op_date and op_hour columns removed
- Return type
-
pudl.transform.epacems.
harmonize_eia_epa_orispl
(df)[source]¶ Harmonize the ORISPL code to match the EIA data – NOT YET IMPLEMENTED.
The EIA plant IDs and CEMS ORISPL codes almost match, but not quite. See https://www.epa.gov/sites/production/files/2018-02/documents/egrid2016_technicalsupportdocument_0.pdf#page=104 for an example.
Note that this transformation needs to be run before fix_up_dates, because fix_up_dates uses the plant ID to look up timezones.
- Parameters
df (pandas.DataFrame) – A CEMS hourly dataframe for one year-month-state
- Returns
The same data, with the ORISPL plant codes corrected to match the EIA plant IDs.
- Return type
Todo
Actually implement the function…