pudl.transform.epacems module

Routines specific to cleaning up EPA CEMS hourly data.

pudl.transform.epacems.add_facility_id_unit_id_epa(df)[source]

Harmonize columns that are added later.

The datapackage validation checks for consistent column names, and these two columns aren’t present before August 2008, so this adds them in.

Parameters

df (pandas.DataFrame) – A CEMS dataframe

Returns

The same DataFrame guaranteed to have int facility_id and unit_id_epa cols.

Return type

pandas.Dataframe

pudl.transform.epacems.correct_gross_load_mw(df)[source]

Fix values of gross load that are wrong by orders of magnitude.

Parameters

df (pandas.DataFrame) – A CEMS dataframe

Returns

The same DataFrame with corrected gross load values.

Return type

pandas.DataFrame

pudl.transform.epacems.fix_up_dates(df, plant_utc_offset)[source]

Fix the dates for the CEMS data.

Parameters
Returns

The same data, with an op_datetime_utc column added and the op_date and op_hour columns removed

Return type

pandas.DataFrame

pudl.transform.epacems.harmonize_eia_epa_orispl(df)[source]

Harmonize the ORISPL code to match the EIA data – NOT YET IMPLEMENTED.

The EIA plant IDs and CEMS ORISPL codes almost match, but not quite. See https://www.epa.gov/sites/production/files/2018-02/documents/egrid2016_technicalsupportdocument_0.pdf#page=104 for an example.

Note that this transformation needs to be run before fix_up_dates, because fix_up_dates uses the plant ID to look up timezones.

Parameters

df (pandas.DataFrame) – A CEMS hourly dataframe for one year-month-state

Returns

The same data, with the ORISPL plant codes corrected to match the EIA plant IDs.

Return type

pandas.DataFrame

Todo

Actually implement the function…

pudl.transform.epacems.transform(epacems_raw_dfs, datapkg_dir)[source]

Transform EPA CEMS hourly data for use in datapackage export.

Todo

Incomplete docstring.