pudl.analysis.allocate_net_gen
Allocate data from generation_fuel_eia923 table to generator level.
Net electricity generation and fuel consumption are reported in mutiple ways in the EIA 923. The generation_fuel_eia923 table reports both generation and fuel consumption, and breaks them down by plant, prime mover, and energy source. In parallel, the generation_eia923 table reports generation by generator, and the boiler_fuel_eia923 table reports fuel consumption by boiler.
The generation_fuel_eia923 table is more complete, but the generation_eia923 + boiler_fuel_eia923 tables are more granular. The generation_eia923 table includes only ~55% of the total MWhs reported in the generation_fuel_eia923 table.
This module estimates the net electricity generation and fuel consumption attributable
to individual generators based on the more expansive reporting of the data in the
generation_fuel_eia923 table. The main coordinating functions here are
allocate_gen_fuel_by_generator_energy_source()
and
aggregate_gen_fuel_by_generator()
.
The algorithm we’re using assumes:
The generation_eia923 table is the authoritative source of information about how much generation is attributable to an individual generator, if it reports in that table.
The generation_fuel_eia923 table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant.
The generators_eia860 table provides an exhaustive list of all generators whose generation is being reported in the generation_fuel_eia923 table.
We allocate the net generation reported in the generation_fuel_eia923 table on the basis of plant, prime mover, and fuel type among the generators in each plant that have matching fuel types. Generation is allocated proportional to reported generation if it’s available, and proportional to each generator’s capacity if generation is not available.
In more detail: within each year of data, we split the plants into three groups:
Plants where ALL generators report in the more granular generation_eia923 table.
Plants where NONE of the generators report in the generation_eia923 table.
Plants where only SOME of the generators report in the generation_eia923 table.
In plant-years where ALL generators report more granular generation, the total net generation reported in the generation_fuel_eia923 table is allocated in proportion to the generation each generator reported in the generation_eia923 table. We do this instead of using net_generation_mwh from generation_eia923 because there are some small discrepancies between the total amounts of generation reported in these two tables.
In plant-years where NONE of the generators report more granular generation, we create a generator record for each associated fuel type. Those records are merged with the generation_fuel_eia923 table on plant, prime mover code, and fuel type. Each group of plant, prime mover, and fuel will have some amount of reported net generation associated with it, and one or more generators. The net generation is allocated among the generators within the group in proportion to their capacity. Then the allocated net generation is summed up by generator.
In the hybrid case, where only SOME of of a plant’s generators report the more granular generation data, we use a combination of the two allocation methods described above. First, the total generation reported across a plant in the generation_fuel_eia923 table is allocated between the two categories of generators (those that report fine-grained generation, and those that don’t) in direct proportion to the fraction of the plant’s generation which is reported in the generation_eia923 table, relative to the total generation reported in the generation_fuel_eia923 table.
Note that this methology does not distinguish between primary and secondary energy_sources for generators. It associates portions of net generation to each generators in the same plant do not report detailed generation, have the same prime_mover_code, and use the same fuels, but have very different capacity factors in reality, this methodology will allocate generation such that they end up with very similar capacity factors. We imagine this is an uncommon scenario.
This methodology has several potential flaws and drawbacks. Because there is no indicator of what portion of the energy_source_codes, we associate the net generation equally among them. In effect, if a plant had multiple generators with the same prime_mover_code but opposite primary and secondary fuels (eg. gen 1 has a primary fuel of ‘NG’ and secondary fuel of ‘DFO’, while gen 2 has a primary fuel of ‘DFO’ and a secondary fuel of ‘NG’), the methodology associates the generation_fuel_eia923 records similarly across these two generators. However, the allocated net generation will still be porporational to each generator’s net generation (if it’s reported) or capacity (if generation is not reported).
Module Contents
Functions
|
Allocate net gen from gen_fuel table to the generator/energy_source_code level. |
|
Aggregate gen fuel data columns to generators. |
|
Scale allocated net gen at the generator/energy_source_code level by ownership. |
|
Aggreate the allocated gen fuel data to the generator level. |
|
Stack the generator table with a set of columns. |
|
Associate the three tables needed to assign net gen to generators. |
|
Remove the retired generators. |
|
Associate unassociated gen_fuel table records on idx_pm. |
|
Make flags and aggregations to prepare for the calc_allocation_ratios(). |
|
Make frac column to allocate net gen from the generation fuel table. |
|
|
|
|
|
Attributes
Id columns for generators. |
|
Id columns for plant, prime mover & fuel type records. |
|
- pudl.analysis.allocate_net_gen.IDX_GENS = ['report_date', 'plant_id_eia', 'generator_id'][source]
Id columns for generators.
- pudl.analysis.allocate_net_gen.IDX_PM_ESC = ['report_date', 'plant_id_eia', 'prime_mover_code', 'energy_source_code'][source]
Id columns for plant, prime mover & fuel type records.
- pudl.analysis.allocate_net_gen.IDX_ESC = ['report_date', 'plant_id_eia', 'energy_source_code'][source]
- pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_generator_energy_source(pudl_out, drop_interim_cols=True)[source]
Allocate net gen from gen_fuel table to the generator/energy_source_code level.
- Three main steps here:
grab the three input tables from pudl_out with only the needed columns
associate generation_fuel_eia923 table data w/ generators
allocate generation_fuel_eia923 table data proportionally
The association process happens via associate_generator_tables().
The allocation process (via calc_allocation_fraction()) entails generating a fraction for each record within a
IDX_PM_ESC
group. We have two data points for generating this ratio: the net generation in the generation_eia923 table and the capacity from the generators_eia860 table. The end result is a frac column which is unique for each generator/prime_mover/fuel record and is used to allocate the associated net generation from the generation_fuel_eia923 table.
- Parameters
pudl_out (pudl.output.pudltabl.PudlTabl) – An object used to create the tables for EIA and FERC Form 1 analysis.
drop_interim_cols (boolean) – True/False flag for dropping interim columns which are used to generate the net_generation_mwh column (they are mostly the frac column and net generataion reported in the original generation_eia923 and generation_fuel_eia923 tables) that are useful for debugging. Default is False, which will drop the columns.
- pudl.analysis.allocate_net_gen.aggregate_gen_fuel_by_generator(pudl_out, gen_pm_fuel: pandas.DataFrame) pandas.DataFrame [source]
Aggregate gen fuel data columns to generators.
The generation_fuel_eia923 table includes net generation and fuel consumption data at the plant/fuel type/prime mover level. The most granular level of plants that PUDL typically uses is at the plant/generator level. This function takes the plant/energy source code/prime mover level allocation, aggregates it to the generator level and then denormalizes it to make it more structurally in-line with the original generation_eia923 table (see
pudl.output.eia923.denorm_generation_eia923()
).- Parameters
pudl_out (pudl.output.pudltabl.PudlTabl) – An object used to create the tables for EIA and FERC Form 1 analysis.
gen_pm_fuel – table of allocated generation at the generator/prime mover /fuel type. Result of
allocate_gen_fuel_by_generator_energy_source()
- Returns
table with columns
IDX_GENS
and net generation and fuel consumption scaled to the level of theIDX_GENS
.
- pudl.analysis.allocate_net_gen.scale_allocated_net_gen_by_ownership(gen_pm_fuel: pandas.DataFrame, gens: pandas.DataFrame, own_eia860: pandas.DataFrame) pandas.DataFrame [source]
Scale allocated net gen at the generator/energy_source_code level by ownership.
It can be helpful to have a table of net generation and fuel consumption at the generator/fuel-type level (i.e. the result of
allocate_gen_fuel_by_generator_energy_source()
) to be associated and scaled with all of the owners of those generators. This allows the aggregation of fuel use to the utility level.Scaling generators with their owners’ ownership fraction is currently possible via
pudl.analysis.plant_parts_eia.MakeMegaGenTbl
. This function uses the allocated net generation at the generator/fuel-type level, merges that with a generators table to ensure all necessary columns are available, and then feeds that table into the EIA Plant-parts’scale_by_ownership()
.- Parameters
gen_pm_fuel – able of allocated generation at the generator/prime mover /fuel type. Result of
allocate_gen_fuel_by_generator_energy_source()
gens – generators_eia860 table with cols: :const:
IDX_GENS
, capacity_mw and utility_id_eiaown_eia860 – ownership_eia860 table.
- pudl.analysis.allocate_net_gen.agg_by_generator(gen_pm_fuel: pandas.DataFrame, by_cols: List[str] = IDX_GENS, sum_cols: List[str] = ['net_generation_mwh', 'fuel_consumed_mmbtu']) pandas.DataFrame [source]
Aggreate the allocated gen fuel data to the generator level.
- Parameters
gen_pm_fuel – result of
allocate_gen_fuel_by_generator_energy_source()
by_cols – list of columns to use as
pandas.groupby
argby
sum_cols – Data columns from that are being aggregated via a
pandas.groupby.sum()
.
- pudl.analysis.allocate_net_gen.stack_generators(gens, cat_col='energy_source_code_num', stacked_col='energy_source_code')[source]
Stack the generator table with a set of columns.
- Parameters
gens (pandas.DataFrame) – generators_eia860 table with cols:
IDX_GENS
and all of the energy_source_code columnscat_col (string) – name of category column which will end up having the column names of cols_to_stack
stacked_col (string) – name of column which will end up with the stacked data from cols_to_stack
- Returns
a dataframe with these columns: idx_stack, cat_col, stacked_col
- Return type
- pudl.analysis.allocate_net_gen.associate_generator_tables(gf, gen, gens)[source]
Associate the three tables needed to assign net gen to generators.
- Parameters
gf (pandas.DataFrame) – generator_fuel_eia923 table with columns:
IDX_PM_ESC
and net_generation_mwh and fuel_consumed_mmbtu.gen (pandas.DataFrame) – generation_eia923 table with columns:
IDX_GENS
and net_generation_mwh.gens (pandas.DataFrame) – generators_eia860 table with cols:
IDX_GENS
and all of the energy_source_code columns
TODO: Convert these groupby/merges into transforms.
- pudl.analysis.allocate_net_gen.remove_retired_generators(gen_assoc)[source]
Remove the retired generators.
We don’t want to associate net generation to generators that are retired (or proposed! or any other operational_status besides existing).
We do want to keep the generators that retire mid-year and have generator specific data from the generation_eia923 table. Removing the generators that retire mid-report year and don’t report to the generation_eia923 table is not exactly a great assumption. For now, we are removing them. We should employ a strategy that allocates only a portion of the generation to them based on their operational months (or by doing the allocation on a monthly basis).
- Parameters
gen_assoc (pandas.DataFrame) – table of generators with stacked fuel types and broadcasted net generation data from the generation_eia923 and generation_fuel_eia923 tables. Output of associate_generator_tables().
- pudl.analysis.allocate_net_gen._associate_unconnected_records(eia_generators_merged: pandas.DataFrame)[source]
Associate unassociated gen_fuel table records on idx_pm.
There are a subset of generation_fuel_eia923 records which do not merge onto the stacked generator table on
IDX_PM_ESC
. These records generally don’t match with the set of prime movers and fuel types in the stacked generator table. In this method, we associate those straggler, unconnected records by merging these records with the stacked generators on the prime mover only.- Parameters
eia_generators_merged –
- pudl.analysis.allocate_net_gen.prep_alloction_fraction(gen_assoc)[source]
Make flags and aggregations to prepare for the calc_allocation_ratios().
In calc_allocation_ratios(), we will break the generators out into four types - see calc_allocation_ratios() docs for details. This function adds flags for splitting the generators. It also adds
- pudl.analysis.allocate_net_gen.calc_allocation_fraction(gen_pm_fuel, drop_interim_cols=True)[source]
Make frac column to allocate net gen from the generation fuel table.
- There are three main types of generators:
“all gen”: generators of plants which fully report to the generators_eia860 table.
“some gen”: generators of plants which partially report to the generators_eia860 table.
“gf only”: generators of plants which do not report at all to the generators_eia860 table.
Each different type of generator needs to be treated slightly differently, but all will end up with a frac column that can be used to allocate the net_generation_mwh_gf_tbl.
- Parameters
gen_pm_fuel (pandas.DataFrame) – output of prep_alloction_fraction().
drop_interim_cols (boolean) – True/False flag for dropping interim columns which are used to generate the frac column (they are mostly interim frac columns and totals of net generataion from various groupings of generators) that are useful for debugging. Default is False.