pudl.analysis.epacamd_eia
#
Helper functions for filtering the EPA CAMD crosswalk table.
This filtering was originally designed to filter the crosswalk before making a
subplant_id
so that the only subplant_id
s that are generated are for records
that show up in EPA CAMD.
Usage Example:
epacems = pudl.output.epacems.epacems(states=[‘ID’], years=[2020]) # subset for test epacamd_eia = pudl_out.epacamd_eia() filtered_crosswalk = filter_crosswalk(epacamd_eia, epacems) crosswalk_with_subplant_ids = pudl.etl.make_subplant_ids(filtered_crosswalk)
Module Contents#
Functions#
|
Get unique unit IDs from CEMS data. |
|
Inner join unique CEMS units with the epacamd_eia crosswalk. |
|
Remove rows that represent graph edges between generators and boilers. |
|
Remove unmapped crosswalk rows or duplicates due to m2m boiler relationships. |
- pudl.analysis.epacamd_eia._get_unique_keys(epacems: pd.DataFrame | dd.DataFrame) pandas.DataFrame [source]#
Get unique unit IDs from CEMS data.
- Parameters:
epacems (Union[pd.DataFrame, dd.DataFrame]) – epacems dataset from pudl.output.epacems.epacems
- Returns:
unique keys from the epacems dataset
- Return type:
pd.DataFrame
- pudl.analysis.epacamd_eia.filter_crosswalk_by_epacems(crosswalk: pandas.DataFrame, epacems: pd.DataFrame | dd.DataFrame) pandas.DataFrame [source]#
Inner join unique CEMS units with the epacamd_eia crosswalk.
This is essentially an empirical filter on EPA units. Instead of filtering by construction/retirement dates in the crosswalk (thus assuming they are accurate), use the presence/absence of CEMS data to filter the units.
- Parameters:
crosswalk – epacamd_eia crosswalk
unique_epacems_ids (pd.DataFrame) – unique ids from _get_unique_keys
- Returns:
The inner join of the epacamd_eia crosswalk and unique epacems units. Adds the global ID column unit_id_epa.
- pudl.analysis.epacamd_eia.filter_out_boiler_rows(crosswalk: pandas.DataFrame) pandas.DataFrame [source]#
Remove rows that represent graph edges between generators and boilers.
- Parameters:
crosswalk (pd.DataFrame) – epacamd_eia crosswalk
- Returns:
- the epacamd_eia crosswalk with boiler rows (many/one-to-many)
removed
- Return type:
pd.DataFrame
- pudl.analysis.epacamd_eia.filter_crosswalk(crosswalk: pandas.DataFrame, epacems: pd.DataFrame | dd.DataFrame) pandas.DataFrame [source]#
Remove unmapped crosswalk rows or duplicates due to m2m boiler relationships.
- Parameters:
crosswalk (pd.DataFrame) – The epacamd_eia crosswalk.
epacems (Union[pd.DataFrame, dd.DataFrame]) – Emissions data. Must contain columns named [“plant_id_eia”, “emissions_unit_id_epa”]
- Returns:
A filtered copy of epacamd_eia crosswalk
- Return type:
pd.DataFrame