pudl.analysis.allocate_net_gen

Allocate data from generation_fuel_eia923 table to generator level.

Net electricity generation and fuel consumption are reported in mutiple ways in the EIA 923. The generation_fuel_eia923 table reports both generation and fuel consumption, and breaks them down by plant, prime mover, and fuel. In parallel, the generation_eia923 table reports generation by generator, and the boiler_fuel_eia923 table reports fuel consumption by boiler.

The generation_fuel_eia923 table is more complete, but the generation_eia923 + boiler_fuel_eia923 tables are more granular. The generation_eia923 table includes only ~55% of the total MWhs reported in the generation_fuel_eia923 table.

This module estimates the net electricity generation and fuel consumption attributable to individual generators based on the more expansive reporting of the data in the generation_fuel_eia923 table. The main coordinating function here is pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_gen().

The algorithm we’re using assumes:

  • The generation_eia923 table is the authoritative source of information about how much generation is attributable to an individual generator, if it reports in that table.

  • The generation_fuel_eia923 table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant.

  • The generators_eia860 table provides an exhaustive list of all generators whose generation is being reported in the generation_fuel_eia923 table.

We allocate the net generation reported in the generation_fuel_eia923 table on the basis of plant, prime mover, and fuel type among the generators in each plant that have matching fuel types. Generation is allocated proportional to reported generation if it’s available, and proportional to each generator’s capacity if generation is not available.

In more detail: within each year of data, we split the plants into three groups:

  • Plants where ALL generators report in the more granular generation_eia923 table.

  • Plants where NONE of the generators report in the generation_eia923 table.

  • Plants where only SOME of the generators report in the generation_eia923 table.

In plant-years where ALL generators report more granular generation, the total net generation reported in the generation_fuel_eia923 table is allocated in proportion to the generation each generator reported in the generation_eia923 table. We do this instead of using net_generation_mwh from generation_eia923 because there are some small discrepancies between the total amounts of generation reported in these two tables.

In plant-years where NONE of the generators report more granular generation, we create a generator record for each associated fuel type. Those records are merged with the generation_fuel_eia923 table on plant, prime mover code, and fuel type. Each group of plant, prime mover, and fuel will have some amount of reported net generation associated with it, and one or more generators. The net generation is allocated among the generators within the group in proportion to their capacity. Then the allocated net generation is summed up by generator.

In the hybrid case, where only SOME of of a plant’s generators report the more granular generation data, we use a combination of the two allocation methods described above. First, the total generation reported across a plant in the generation_fuel_eia923 table is allocated between the two categories of generators (those that report fine-grained generation, and those that don’t) in direct proportion to the fraction of the plant’s generation which is reported in the generation_eia923 table, relative to the total generation reported in the generation_fuel_eia923 table.

Note that this methology does not distinguish between primary and secondary energy_sources for generators. It associates portions of net generation to each generators in the same plant do not report detailed generation, have the same prime_mover_code, and use the same fuels, but have very different capacity factors in reality, this methodology will allocate generation such that they end up with very similar capacity factors. We imagine this is an uncommon scenario.

This methodology has several potential flaws and drawbacks. Because there is no indicator of what portion of the energy_source_codes, we associate the net generation equally among them. In effect, if a plant had multiple generators with the same prime_mover_code but opposite primary and secondary fuels (eg. gen 1 has a primary fuel of ‘NG’ and secondary fuel of ‘DFO’, while gen 2 has a primary fuel of ‘DFO’ and a secondary fuel of ‘NG’), the methodology associates the generation_fuel_eia923 records similarly across these two generators. However, the allocated net generation will still be porporational to each generator’s net generation (if it’s reported) or capacity (if generation is not reported).

Module Contents

Functions

allocate_gen_fuel_by_gen(pudl_out)

Allocate gen fuel data columns to generators.

allocate_gen_fuel_by_gen_pm_fuel(gf, gen, gens, drop_interim_cols=True)

Proportionally allocate net gen from gen_fuel table to generators.

agg_by_generator(gen_pm_fuel)

Aggreate the allocated gen fuel data to the generator level.

stack_generators(gens, cat_col='energy_source_code_num', stacked_col='energy_source_code')

Stack the generator table with a set of columns.

associate_generator_tables(gf, gen, gens)

Associate the three tables needed to assign net gen to generators.

remove_retired_generators(gen_assoc)

Remove the retired generators.

_associate_unconnected_records(eia_generators_merged)

Associate unassociated gen_fuel table records on idx_pm.

_associate_energy_source_only(gen_assoc, gf)

Associate the records w/o prime movers with fuel cost.

_associate_energy_source_only_wo_matching_energy_source(gen_assoc)

Associate the missing-pm records that don't have matching fuel types.

prep_alloction_fraction(gen_assoc)

Make flags and aggregations to prepare for the calc_allocation_ratios().

calc_allocation_fraction(gen_pm_fuel, drop_interim_cols=True)

Make frac column to allocate net gen from the generation fuel table.

_test_frac(gen_pm_fuel)

_test_gen_pm_fuel_output(gen_pm_fuel, gf, gen)

_test_gen_fuel_allocation(gen, gen_allocated, ratio=0.05)

Attributes

logger

IDX_GENS

Id columns for generators.

IDX_PM_FUEL

Id columns for plant, prime mover & fuel type records.

IDX_FUEL

DATA_COLS

Data columns from generation_fuel_eia923 that are being allocated.

pudl.analysis.allocate_net_gen.logger[source]
pudl.analysis.allocate_net_gen.IDX_GENS = ['plant_id_eia', 'generator_id', 'report_date'][source]

Id columns for generators.

pudl.analysis.allocate_net_gen.IDX_PM_FUEL = ['plant_id_eia', 'prime_mover_code', 'energy_source_code', 'report_date'][source]

Id columns for plant, prime mover & fuel type records.

pudl.analysis.allocate_net_gen.IDX_FUEL = ['report_date', 'plant_id_eia', 'energy_source_code'][source]
pudl.analysis.allocate_net_gen.DATA_COLS = ['net_generation_mwh', 'fuel_consumed_mmbtu'][source]

Data columns from generation_fuel_eia923 that are being allocated.

pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_gen(pudl_out)[source]

Allocate gen fuel data columns to generators.

The generation_fuel_eia923 table includes net generation and fuel consumption data at the plant/fuel type/prime mover level. The most granular level of plants that PUDL typically uses is at the plant/generator level. This method converts the generation_fuel_eia923 table to the level of plant/generators.

Parameters

pudl_out (pudl.output.pudltabl.PudlTabl) – An object used to create the tables for EIA and FERC Form 1 analysis.

Returns

table with columns IDX_GENS and DATA_COLS. The DATA_COLS will be scaled to the level of the IDX_GENS.

Return type

pandas.DataFrame

pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_gen_pm_fuel(gf, gen, gens, drop_interim_cols=True)[source]

Proportionally allocate net gen from gen_fuel table to generators.

Two main steps here:
  • associate generation_fuel_eia923 table data w/ generators

  • allocate generation_fuel_eia923 table data proportionally

The association process happens via associate_generator_tables().

The allocation process (via calc_allocation_fraction()) entails generating a fraction for each record within a IDX_PM_FUEL group. We have two data points for generating this ratio: the net generation in the generation_eia923 table and the capacity from the generators_eia860 table. The end result is a frac column which is unique for each generator/prime_mover/fuel record and is used to allocate the associated net generation from the generation_fuel_eia923 table.

Args:
gf (pandas.DataFrame): generator_fuel_eia923 table with columns:

IDX_PM_FUEL and net_generation_mwh and fuel_consumed_mmbtu.

gen (pandas.DataFrame): generation_eia923 table with columns:

IDX_GENS and net_generation_mwh.

gens (pandas.DataFrame): generators_eia860 table with cols:

IDX_GENS, capacity_mw, prime_mover_code, and all of the energy_source_code columns

drop_interim_cols (boolean): True/False flag for dropping interim

columns which are used to generate the net_generation_mwh column (they are mostly the frac column and net generataion reported in the original generation_eia923 and generation_fuel_eia923 tables) that are useful for debugging. Default is False, which will drop the columns.

Returns

pandas.DataFrame

pudl.analysis.allocate_net_gen.agg_by_generator(gen_pm_fuel)[source]

Aggreate the allocated gen fuel data to the generator level.

Parameters

gen_pm_fuel (pandas.DataFrame) – result of allocate_gen_fuel_by_gen_pm_fuel()

pudl.analysis.allocate_net_gen.stack_generators(gens, cat_col='energy_source_code_num', stacked_col='energy_source_code')[source]

Stack the generator table with a set of columns.

Parameters
  • gens (pandas.DataFrame) – generators_eia860 table with cols: IDX_GENS and all of the energy_source_code columns

  • cat_col (string) – name of category column which will end up having the column names of cols_to_stack

  • stacked_col (string) – name of column which will end up with the stacked data from cols_to_stack

Returns

a dataframe with these columns: idx_stack, cat_col, stacked_col

Return type

pandas.DataFrame

pudl.analysis.allocate_net_gen.associate_generator_tables(gf, gen, gens)[source]

Associate the three tables needed to assign net gen to generators.

Parameters
  • gf (pandas.DataFrame) – generator_fuel_eia923 table with columns: IDX_PM_FUEL and net_generation_mwh and fuel_consumed_mmbtu.

  • gen (pandas.DataFrame) – generation_eia923 table with columns: IDX_GENS and net_generation_mwh.

  • gens (pandas.DataFrame) – generators_eia860 table with cols: IDX_GENS and all of the energy_source_code columns

TODO: Convert these groupby/merges into transforms.

pudl.analysis.allocate_net_gen.remove_retired_generators(gen_assoc)[source]

Remove the retired generators.

We don’t want to associate net generation to generators that are retired (or proposed! or any other operational_status besides existing).

We do want to keep the generators that retire mid-year and have generator specific data from the generation_eia923 table. Removing the generators that retire mid-report year and don’t report to the generation_eia923 table is not exactly a great assumption. For now, we are removing them. We should employ a strategy that allocates only a portion of the generation to them based on their operational months (or by doing the allocation on a monthly basis).

Parameters

gen_assoc (pandas.DataFrame) – table of generators with stacked fuel types and broadcasted net generation data from the generation_eia923 and generation_fuel_eia923 tables. Output of associate_generator_tables().

pudl.analysis.allocate_net_gen._associate_unconnected_records(eia_generators_merged)[source]

Associate unassociated gen_fuel table records on idx_pm.

There are a subset of generation_fuel_eia923 records which do not merge onto the stacked generator table on IDX_PM_FUEL. These records generally don’t match with the set of prime movers and fuel types in the stacked generator table. In this method, we associate those straggler, unconnected records by merging these records with the stacked generators on the prime mover only.

Parameters

eia_generators_merged (pandas.DataFrame) –

pudl.analysis.allocate_net_gen._associate_energy_source_only(gen_assoc, gf)[source]

Associate the records w/o prime movers with fuel cost.

The 2001 and 2002 generation fuel table does not include any prime mover codes. Because of this, we need to associated these records via their fuel types.

Note: 2001 and 2002 eia years are not currently integrated into PUDL.

pudl.analysis.allocate_net_gen._associate_energy_source_only_wo_matching_energy_source(gen_assoc)[source]

Associate the missing-pm records that don’t have matching fuel types.

There are some generation fuel table records which don’t associate with any of the energy_source_code’s reported in for the generators. For these records, we need to take a step back and associate these records with the full plant.

pudl.analysis.allocate_net_gen.prep_alloction_fraction(gen_assoc)[source]

Make flags and aggregations to prepare for the calc_allocation_ratios().

In calc_allocation_ratios(), we will break the generators out into four types - see calc_allocation_ratios() docs for details. This function adds flags for splitting the generators. It also adds

pudl.analysis.allocate_net_gen.calc_allocation_fraction(gen_pm_fuel, drop_interim_cols=True)[source]

Make frac column to allocate net gen from the generation fuel table.

There are three main types of generators:
  • “all gen”: generators of plants which fully report to the generators_eia860 table.

  • “some gen”: generators of plants which partially report to the generators_eia860 table.

  • “gf only”: generators of plants which do not report at all to the generators_eia860 table.

  • “no pm”: generators that have missing prime movers.

Each different type of generator needs to be treated slightly differently, but all will end up with a frac column that can be used to allocate the net_generation_mwh_gf_tbl.

Parameters
  • gen_pm_fuel (pandas.DataFrame) – output of prep_alloction_fraction().

  • drop_interim_cols (boolean) – True/False flag for dropping interim columns which are used to generate the frac column (they are mostly interim frac columns and totals of net generataion from various groupings of generators) that are useful for debugging. Default is False.

pudl.analysis.allocate_net_gen._test_frac(gen_pm_fuel)[source]
pudl.analysis.allocate_net_gen._test_gen_pm_fuel_output(gen_pm_fuel, gf, gen)[source]
pudl.analysis.allocate_net_gen._test_gen_fuel_allocation(gen, gen_allocated, ratio=0.05)[source]