pudl.analysis.record_linkage.classify_plants_ferc1¶
Scikit-Learn classification pipeline for identifying related FERC 1 plant records.
Sadly FERC doesn’t provide any kind of real IDs for the plants that report to them – all we have is their names (a freeform string) and the data that is reported alongside them. This is often enough information to be able to recognize which records ought to be associated with each other year to year to create a continuous time series. However, we want to do that programmatically, which means using some clustering / categorization tools from scikit-learn
Attributes¶
Functions¶
|
Tests that plant_id_ferc1 timeseries includes one record per year. |
|
Merge steam plants and fuel dfs to prepare inputs for ferc plant matching. |
|
Assign IDs to the large steam plants. |
Module Contents¶
- pudl.analysis.record_linkage.classify_plants_ferc1._FUEL_COLS = ['coal_fraction_mmbtu', 'gas_fraction_mmbtu', 'nuclear_fraction_mmbtu', 'oil_fraction_mmbtu',...[source]¶
- pudl.analysis.record_linkage.classify_plants_ferc1.plants_steam_validate_ids(ferc_to_ferc_tracker: pudl.analysis.ml_tools.experiment_tracking.ExperimentTracker, ferc1_steam_df: pandas.DataFrame, label_df: pandas.DataFrame) pandas.DataFrame [source]¶
Tests that plant_id_ferc1 timeseries includes one record per year.
- Parameters:
ferc1_steam_df – A DataFrame of the data from the FERC 1 Steam table.
label_df – A DataFrame containing column of newly assigned plant labels.
- Returns:
The input dataframe, to enable method chaining.
- pudl.analysis.record_linkage.classify_plants_ferc1.merge_steam_fuel_dfs(ferc1_steam_df: pandas.DataFrame, fuel_fractions: pandas.DataFrame) pandas.DataFrame [source]¶
Merge steam plants and fuel dfs to prepare inputs for ferc plant matching.
- pudl.analysis.record_linkage.classify_plants_ferc1.ferc_to_ferc(experiment_tracker: pudl.analysis.ml_tools.experiment_tracking.ExperimentTracker, core_ferc1__yearly_steam_plants_sched402: pandas.DataFrame, out_ferc1__yearly_steam_plants_fuel_by_plant_sched402: pandas.DataFrame) pandas.DataFrame [source]¶
Assign IDs to the large steam plants.