pudl.analysis.record_linkage.classify_plants_ferc1

Scikit-Learn classification pipeline for identifying related FERC 1 plant records.

Sadly FERC doesn’t provide any kind of real IDs for the plants that report to them – all we have is their names (a freeform string) and the data that is reported alongside them. This is often enough information to be able to recognize which records ought to be associated with each other year to year to create a continuous time series. However, we want to do that programmatically, which means using some clustering / categorization tools from scikit-learn

Attributes

Functions

plants_steam_validate_ids(→ pandas.DataFrame)

Tests that plant_id_ferc1 timeseries includes one record per year.

merge_steam_fuel_dfs(→ pandas.DataFrame)

Merge steam plants and fuel dfs to prepare inputs for ferc plant matching.

ferc_to_ferc(→ pandas.DataFrame)

Assign IDs to the large steam plants.

Module Contents

pudl.analysis.record_linkage.classify_plants_ferc1.logger[source]
pudl.analysis.record_linkage.classify_plants_ferc1._FUEL_COLS = ['coal_fraction_mmbtu', 'gas_fraction_mmbtu', 'nuclear_fraction_mmbtu', 'oil_fraction_mmbtu',...[source]
pudl.analysis.record_linkage.classify_plants_ferc1.ferc_dataframe_embedder[source]
pudl.analysis.record_linkage.classify_plants_ferc1.plants_steam_validate_ids(ferc_to_ferc_tracker: pudl.analysis.ml_tools.experiment_tracking.ExperimentTracker, ferc1_steam_df: pandas.DataFrame, label_df: pandas.DataFrame) pandas.DataFrame[source]

Tests that plant_id_ferc1 timeseries includes one record per year.

Parameters:
  • ferc1_steam_df – A DataFrame of the data from the FERC 1 Steam table.

  • label_df – A DataFrame containing column of newly assigned plant labels.

Returns:

The input dataframe, to enable method chaining.

pudl.analysis.record_linkage.classify_plants_ferc1.merge_steam_fuel_dfs(ferc1_steam_df: pandas.DataFrame, fuel_fractions: pandas.DataFrame) pandas.DataFrame[source]

Merge steam plants and fuel dfs to prepare inputs for ferc plant matching.

pudl.analysis.record_linkage.classify_plants_ferc1.ferc_to_ferc(experiment_tracker: pudl.analysis.ml_tools.experiment_tracking.ExperimentTracker, core_ferc1__yearly_steam_plants_sched402: pandas.DataFrame, out_ferc1__yearly_steam_plants_fuel_by_plant_sched402: pandas.DataFrame) pandas.DataFrame[source]

Assign IDs to the large steam plants.