pudl.analysis.epacamd_eia#

Helper functions for filtering the EPA CAMD crosswalk table.

This filtering was originally designed to filter the crosswalk before making a subplant_id so that the only subplant_id s that are generated are for records that show up in EPA CAMD.

Usage Example:

epacems = pudl.output.epacems.epacems(states=[‘ID’], years=[2020]) # subset for test core_epa__assn_eia_epacamd = pudl_out.epacamd_eia() filtered_crosswalk = filter_crosswalk(core_epa__assn_eia_epacamd, epacems) crosswalk_with_subplant_ids = pudl.etl.make_subplant_ids(filtered_crosswalk)

Module Contents#

Functions#

_get_unique_keys(→ pandas.DataFrame)

Get unique unit IDs from CEMS data.

filter_crosswalk_by_epacems(→ pandas.DataFrame)

Inner join unique CEMS units with the core_epa__assn_eia_epacamd crosswalk.

filter_out_boiler_rows(→ pandas.DataFrame)

Remove rows that represent graph edges between generators and boilers.

filter_crosswalk(→ pandas.DataFrame)

Remove unmapped crosswalk rows or duplicates due to m2m boiler relationships.

pudl.analysis.epacamd_eia._get_unique_keys(epacems: pandas.DataFrame | dask.dataframe.DataFrame) pandas.DataFrame[source]#

Get unique unit IDs from CEMS data.

Parameters:

epacems (Union[pd.DataFrame, dd.DataFrame]) – epacems dataset from pudl.output.epacems.epacems

Returns:

unique keys from the epacems dataset

Return type:

pd.DataFrame

pudl.analysis.epacamd_eia.filter_crosswalk_by_epacems(crosswalk: pandas.DataFrame, epacems: pandas.DataFrame | dask.dataframe.DataFrame) pandas.DataFrame[source]#

Inner join unique CEMS units with the core_epa__assn_eia_epacamd crosswalk.

This is essentially an empirical filter on EPA units. Instead of filtering by construction/retirement dates in the crosswalk (thus assuming they are accurate), use the presence/absence of CEMS data to filter the units.

Parameters:
  • crosswalk – core_epa__assn_eia_epacamd crosswalk

  • unique_epacems_ids (pd.DataFrame) – unique ids from _get_unique_keys

Returns:

The inner join of the core_epa__assn_eia_epacamd crosswalk and unique epacems units. Adds the global ID column unit_id_epa.

pudl.analysis.epacamd_eia.filter_out_boiler_rows(crosswalk: pandas.DataFrame) pandas.DataFrame[source]#

Remove rows that represent graph edges between generators and boilers.

Parameters:

crosswalk (pd.DataFrame) – core_epa__assn_eia_epacamd crosswalk

Returns:

the core_epa__assn_eia_epacamd crosswalk with boiler rows (many/one-to-many)

removed

Return type:

pd.DataFrame

pudl.analysis.epacamd_eia.filter_crosswalk(crosswalk: pandas.DataFrame, epacems: pandas.DataFrame | dask.dataframe.DataFrame) pandas.DataFrame[source]#

Remove unmapped crosswalk rows or duplicates due to m2m boiler relationships.

Parameters:
  • crosswalk (pd.DataFrame) – The core_epa__assn_eia_epacamd crosswalk.

  • epacems (Union[pd.DataFrame, dd.DataFrame]) – Emissions data. Must contain columns named [“plant_id_eia”, “emissions_unit_id_epa”]

Returns:

A filtered copy of core_epa__assn_eia_epacamd crosswalk

Return type:

pd.DataFrame