pudl.analysis.allocate_net_gen#

Allocate data from generation_fuel_eia923 table to generator level.

Net electricity generation and fuel consumption are reported in mutiple ways in the EIA 923. The generation_fuel_eia923 table reports both generation and fuel consumption, and breaks them down by plant, prime mover, and energy source. In parallel, the generation_eia923 table reports generation by generator, and the boiler_fuel_eia923 table reports fuel consumption by boiler.

The generation_fuel_eia923 table is more complete, but the generation_eia923 + boiler_fuel_eia923 tables are more granular. The generation_eia923 table includes only ~55% of the total MWhs reported in the generation_fuel_eia923 table.

This module estimates the net electricity generation and fuel consumption attributable to individual generators based on the more expansive reporting of the data in the generation_fuel_eia923 table. The main coordinating functions here are allocate_gen_fuel_by_generator_energy_source() and aggregate_gen_fuel_by_generator().

The algorithm we’re using assumes:

  • The generation_eia923 table is the authoritative source of information about how much generation is attributable to an individual generator, if it reports in that table.

  • The generation_fuel_eia923 table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant.

  • The generators_eia860 table provides an exhaustive list of all generators whose generation is being reported in the generation_fuel_eia923 table.

We allocate the net generation reported in the generation_fuel_eia923 table on the basis of plant, prime mover, and fuel type among the generators in each plant that have matching fuel types. Generation is allocated proportional to reported generation if it’s available, and proportional to each generator’s capacity if generation is not available.

In more detail: within each year of data, we split the plants into three groups:

  • Plants where ALL generators report in the more granular generation_eia923 table.

  • Plants where NONE of the generators report in the generation_eia923 table.

  • Plants where only SOME of the generators report in the generation_eia923 table.

In plant-years where ALL generators report more granular generation, the total net generation reported in the generation_fuel_eia923 table is allocated in proportion to the generation each generator reported in the generation_eia923 table. We do this instead of using net_generation_mwh from generation_eia923 because there are some small discrepancies between the total amounts of generation reported in these two tables.

In plant-years where NONE of the generators report more granular generation, we create a generator record for each associated fuel type. Those records are merged with the generation_fuel_eia923 table on plant, prime mover code, and fuel type. Each group of plant, prime mover, and fuel will have some amount of reported net generation associated with it, and one or more generators. The net generation is allocated among the generators within the group in proportion to their capacity. Then the allocated net generation is summed up by generator.

In the hybrid case, where only SOME of of a plant’s generators report the more granular generation data, we use a combination of the two allocation methods described above. First, the total generation reported across a plant in the generation_fuel_eia923 table is allocated between the two categories of generators (those that report fine-grained generation, and those that don’t) in direct proportion to the fraction of the plant’s generation which is reported in the generation_eia923 table, relative to the total generation reported in the generation_fuel_eia923 table.

Note that this methology does not distinguish between primary and secondary energy_sources for generators. It associates portions of net generation to each generators in the same plant do not report detailed generation, have the same prime_mover_code, and use the same fuels, but have very different capacity factors in reality, this methodology will allocate generation such that they end up with very similar capacity factors. We imagine this is an uncommon scenario.

This methodology has several potential flaws and drawbacks. Because there is no indicator of what portion of the energy_source_codes, we associate the net generation equally among them. In effect, if a plant had multiple generators with the same prime_mover_code but opposite primary and secondary fuels (eg. gen 1 has a primary fuel of ‘NG’ and secondary fuel of ‘DFO’, while gen 2 has a primary fuel of ‘DFO’ and a secondary fuel of ‘NG’), the methodology associates the generation_fuel_eia923 records similarly across these two generators. However, the allocated net generation will still be porporational to each generator’s net generation (if it’s reported) or capacity (if generation is not reported).

Module Contents#

Functions#

allocate_gen_fuel_by_generator_energy_source(pudl_out)

Allocate net gen from gen_fuel table to the generator/energy_source_code level.

aggregate_gen_fuel_by_generator(→ pandas.DataFrame)

Aggregate gen fuel data columns to generators.

scale_allocated_net_gen_by_ownership(→ pandas.DataFrame)

Scale allocated net gen at the generator/energy_source_code level by ownership.

agg_by_generator(→ pandas.DataFrame)

Aggreate the allocated gen fuel data to the generator level.

stack_generators(gens[, cat_col, stacked_col])

Stack the generator table with a set of columns.

associate_generator_tables(gf, gen, gens)

Associate the three tables needed to assign net gen to generators.

remove_retired_generators(gen_assoc)

Remove the retired generators.

_associate_unconnected_records(eia_generators_merged)

Associate unassociated gen_fuel table records on idx_pm.

prep_alloction_fraction(gen_assoc)

Make flags and aggregations to prepare for the calc_allocation_ratios().

calc_allocation_fraction(gen_pm_fuel[, drop_interim_cols])

Make frac column to allocate net gen from the generation fuel table.

_test_frac(gen_pm_fuel)

_test_gen_pm_fuel_output(gen_pm_fuel, gf, gen)

_test_gen_fuel_allocation(gen, gen_pm_fuel[, ratio])

Attributes#

logger

IDX_GENS

Id columns for generators.

IDX_PM_ESC

Id columns for plant, prime mover & fuel type records.

IDX_ESC

pudl.analysis.allocate_net_gen.logger[source]#
pudl.analysis.allocate_net_gen.IDX_GENS = ['report_date', 'plant_id_eia', 'generator_id'][source]#

Id columns for generators.

pudl.analysis.allocate_net_gen.IDX_PM_ESC = ['report_date', 'plant_id_eia', 'prime_mover_code', 'energy_source_code'][source]#

Id columns for plant, prime mover & fuel type records.

pudl.analysis.allocate_net_gen.IDX_ESC = ['report_date', 'plant_id_eia', 'energy_source_code'][source]#
pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_generator_energy_source(pudl_out, drop_interim_cols=True)[source]#

Allocate net gen from gen_fuel table to the generator/energy_source_code level.

Three main steps here:
  • grab the three input tables from pudl_out with only the needed columns

  • associate generation_fuel_eia923 table data w/ generators

  • allocate generation_fuel_eia923 table data proportionally

The association process happens via associate_generator_tables().

The allocation process (via calc_allocation_fraction()) entails generating a fraction for each record within a IDX_PM_ESC group. We have two data points for generating this ratio: the net generation in the generation_eia923 table and the capacity from the generators_eia860 table. The end result is a frac column which is unique for each generator/prime_mover/fuel record and is used to allocate the associated net generation from the generation_fuel_eia923 table.

Parameters:
  • pudl_out (pudl.output.pudltabl.PudlTabl) – An object used to create the tables for EIA and FERC Form 1 analysis.

  • drop_interim_cols (boolean) – True/False flag for dropping interim columns which are used to generate the net_generation_mwh column (they are mostly the frac column and net generataion reported in the original generation_eia923 and generation_fuel_eia923 tables) that are useful for debugging. Default is False, which will drop the columns.

pudl.analysis.allocate_net_gen.aggregate_gen_fuel_by_generator(pudl_out, gen_pm_fuel: pandas.DataFrame) pandas.DataFrame[source]#

Aggregate gen fuel data columns to generators.

The generation_fuel_eia923 table includes net generation and fuel consumption data at the plant/fuel type/prime mover level. The most granular level of plants that PUDL typically uses is at the plant/generator level. This function takes the plant/energy source code/prime mover level allocation, aggregates it to the generator level and then denormalizes it to make it more structurally in-line with the original generation_eia923 table (see pudl.output.eia923.denorm_generation_eia923()).

Parameters:
Returns:

table with columns IDX_GENS and net generation and fuel consumption scaled to the level of the IDX_GENS.

pudl.analysis.allocate_net_gen.scale_allocated_net_gen_by_ownership(gen_pm_fuel: pandas.DataFrame, gens: pandas.DataFrame, own_eia860: pandas.DataFrame) pandas.DataFrame[source]#

Scale allocated net gen at the generator/energy_source_code level by ownership.

It can be helpful to have a table of net generation and fuel consumption at the generator/fuel-type level (i.e. the result of allocate_gen_fuel_by_generator_energy_source()) to be associated and scaled with all of the owners of those generators. This allows the aggregation of fuel use to the utility level.

Scaling generators with their owners’ ownership fraction is currently possible via pudl.analysis.plant_parts_eia.MakeMegaGenTbl. This function uses the allocated net generation at the generator/fuel-type level, merges that with a generators table to ensure all necessary columns are available, and then feeds that table into the EIA Plant-parts’ scale_by_ownership().

Parameters:
  • gen_pm_fuel – able of allocated generation at the generator/prime mover /fuel type. Result of allocate_gen_fuel_by_generator_energy_source()

  • gensgenerators_eia860 table with cols: :const:IDX_GENS, capacity_mw and utility_id_eia

  • own_eia860ownership_eia860 table.

pudl.analysis.allocate_net_gen.agg_by_generator(gen_pm_fuel: pandas.DataFrame, by_cols: list[str] = IDX_GENS, sum_cols: list[str] = ['net_generation_mwh', 'fuel_consumed_mmbtu']) pandas.DataFrame[source]#

Aggreate the allocated gen fuel data to the generator level.

Parameters:
pudl.analysis.allocate_net_gen.stack_generators(gens, cat_col='energy_source_code_num', stacked_col='energy_source_code')[source]#

Stack the generator table with a set of columns.

Parameters:
  • gens (pandas.DataFrame) – generators_eia860 table with cols: IDX_GENS and all of the energy_source_code columns

  • cat_col (string) – name of category column which will end up having the column names of cols_to_stack

  • stacked_col (string) – name of column which will end up with the stacked data from cols_to_stack

Returns:

a dataframe with these columns: idx_stack, cat_col, stacked_col

Return type:

pandas.DataFrame

pudl.analysis.allocate_net_gen.associate_generator_tables(gf, gen, gens)[source]#

Associate the three tables needed to assign net gen to generators.

Parameters:
  • gf (pandas.DataFrame) – generator_fuel_eia923 table with columns: IDX_PM_ESC and net_generation_mwh and fuel_consumed_mmbtu.

  • gen (pandas.DataFrame) – generation_eia923 table with columns: IDX_GENS and net_generation_mwh.

  • gens (pandas.DataFrame) – generators_eia860 table with cols: IDX_GENS and all of the energy_source_code columns

TODO: Convert these groupby/merges into transforms.

pudl.analysis.allocate_net_gen.remove_retired_generators(gen_assoc)[source]#

Remove the retired generators.

We don’t want to associate net generation to generators that are retired (or proposed! or any other operational_status besides existing).

We do want to keep the generators that retire mid-year and have generator specific data from the generation_eia923 table. Removing the generators that retire mid-report year and don’t report to the generation_eia923 table is not exactly a great assumption. For now, we are removing them. We should employ a strategy that allocates only a portion of the generation to them based on their operational months (or by doing the allocation on a monthly basis).

Parameters:

gen_assoc (pandas.DataFrame) – table of generators with stacked fuel types and broadcasted net generation data from the generation_eia923 and generation_fuel_eia923 tables. Output of associate_generator_tables().

pudl.analysis.allocate_net_gen._associate_unconnected_records(eia_generators_merged: pandas.DataFrame)[source]#

Associate unassociated gen_fuel table records on idx_pm.

There are a subset of generation_fuel_eia923 records which do not merge onto the stacked generator table on IDX_PM_ESC. These records generally don’t match with the set of prime movers and fuel types in the stacked generator table. In this method, we associate those straggler, unconnected records by merging these records with the stacked generators on the prime mover only.

Parameters:

eia_generators_merged

pudl.analysis.allocate_net_gen.prep_alloction_fraction(gen_assoc)[source]#

Make flags and aggregations to prepare for the calc_allocation_ratios().

In calc_allocation_ratios(), we will break the generators out into four types - see calc_allocation_ratios() docs for details. This function adds flags for splitting the generators. It also adds

pudl.analysis.allocate_net_gen.calc_allocation_fraction(gen_pm_fuel, drop_interim_cols=True)[source]#

Make frac column to allocate net gen from the generation fuel table.

There are three main types of generators:
  • “all gen”: generators of plants which fully report to the generators_eia860 table.

  • “some gen”: generators of plants which partially report to the generators_eia860 table.

  • “gf only”: generators of plants which do not report at all to the generators_eia860 table.

Each different type of generator needs to be treated slightly differently, but all will end up with a frac column that can be used to allocate the net_generation_mwh_gf_tbl.

Parameters:
  • gen_pm_fuel (pandas.DataFrame) – output of prep_alloction_fraction().

  • drop_interim_cols (boolean) – True/False flag for dropping interim columns which are used to generate the frac column (they are mostly interim frac columns and totals of net generataion from various groupings of generators) that are useful for debugging. Default is False.

pudl.analysis.allocate_net_gen._test_frac(gen_pm_fuel)[source]#
pudl.analysis.allocate_net_gen._test_gen_pm_fuel_output(gen_pm_fuel, gf, gen)[source]#
pudl.analysis.allocate_net_gen._test_gen_fuel_allocation(gen, gen_pm_fuel, ratio=0.05)[source]#