pudl.output.eia860

Functions for pulling data primarily from the EIA’s Form 860.

Module Contents

Functions

utilities_eia860(pudl_engine, start_date=None, end_date=None)

Pull all fields from the EIA860 Utilities table.

plants_eia860(pudl_engine, start_date=None, end_date=None)

Pull all fields from the EIA Plants tables.

plants_utils_eia860(pudl_engine, start_date=None, end_date=None)

Create a dataframe of plant and utility IDs and names from EIA 860.

generators_eia860(pudl_engine: sqlalchemy.engine.Engine, start_date=None, end_date=None, unit_ids: bool = False, fill_tech_desc: bool = True) → pandas.DataFrame

Pull all fields reported in the generators_eia860 table.

fill_generator_technology_description(gens_df: pandas.DataFrame) → pandas.DataFrame

Fill in missing technology_description based on generator and energy source.

boiler_generator_assn_eia860(pudl_engine, start_date=None, end_date=None)

Pull all fields from the EIA 860 boiler generator association table.

ownership_eia860(pudl_engine, start_date=None, end_date=None)

Pull a useful set of fields related to ownership_eia860 table.

assign_unit_ids(gens_df)

Group generators into operational units using various heuristics.

fill_unit_ids(gens_df)

Back and forward fill Unit IDs for each plant / gen combination.

max_unit_id_by_plant(gens_df)

Identify the largest unit ID associated with each plant so we don't overlap.

_append_masked_units(gens_df, row_mask, unit_ids, on)

Replace rows with new PUDL Unit IDs in the original dataframe.

assign_single_gen_unit_ids(gens_df, prime_mover_codes, fuel_type_code_pudl=None, label_prefix='single')

Assign a unique PUDL Unit ID to each generator of a given prime mover type.

assign_cc_unit_ids(gens_df)

Assign PUDL Unit IDs for combined cycle generation units.

assign_prime_fuel_unit_ids(gens_df, prime_mover_code, fuel_type_code_pudl)

Assign a PUDL Unit ID to all generators with a given prime mover and fuel.

Attributes

logger

pudl.output.eia860.logger[source]
pudl.output.eia860.utilities_eia860(pudl_engine, start_date=None, end_date=None)[source]

Pull all fields from the EIA860 Utilities table.

Parameters
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns

A DataFrame containing all the fields of the EIA 860 Utilities table.

Return type

pandas.DataFrame

pudl.output.eia860.plants_eia860(pudl_engine, start_date=None, end_date=None)[source]

Pull all fields from the EIA Plants tables.

Parameters
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns

A DataFrame containing all the fields of the EIA 860 Plants table.

Return type

pandas.DataFrame

pudl.output.eia860.plants_utils_eia860(pudl_engine, start_date=None, end_date=None)[source]

Create a dataframe of plant and utility IDs and names from EIA 860.

Returns a pandas dataframe with the following columns: - report_date (in which data was reported) - plant_name_eia (from EIA entity) - plant_id_eia (from EIA entity) - plant_id_pudl - utility_id_eia (from EIA860) - utility_name_eia (from EIA860) - utility_id_pudl

Parameters
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns

A DataFrame containing plant and utility IDs and names from EIA 860.

Return type

pandas.DataFrame

pudl.output.eia860.generators_eia860(pudl_engine: sqlalchemy.engine.Engine, start_date=None, end_date=None, unit_ids: bool = False, fill_tech_desc: bool = True) pandas.DataFrame[source]

Pull all fields reported in the generators_eia860 table.

Merge in other useful fields including the latitude & longitude of the plant that the generators are part of, canonical plant & operator names and the PUDL IDs of the plant and operator, for merging with other PUDL data sources.

Fill in data for adjacent years if requested, but never fill in earlier than the earliest working year of data for EIA923, and never add more than one year on after the reported data (since there should at most be a one year lag between EIA923 and EIA860 reporting)

This also fills the technology_description field according to matching energy_source_code_1 values. It will only do so if the energy_source_code_1 is consistent throughout years for a given plant.

Parameters
  • pudl_engine – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • unit_ids – If True, use several heuristics to assign individual generators to functional units. EXPERIMENTAL.

  • fill_tech_desc – If True, backfill the technology_description field to years earlier than 2013 based on plant and energy_source_code_1 and fill in technologies with only one matching code.

Returns

A DataFrame containing all the fields of the EIA 860 Generators table.

pudl.output.eia860.fill_generator_technology_description(gens_df: pandas.DataFrame) pandas.DataFrame[source]

Fill in missing technology_description based on generator and energy source.

Prior to 2014, the EIA 860 did not report technology_description. This function backfills those early years within groups defined by plant_id_eia, generator_id and energy_source_code_1. Some remaining missing values are then filled in using the consistent, unique mappings that are observed between energy_source_code_1 and technology_type across all years and generators.

As a result, more than 95% of all generator records end up having a technology_description associated with them.

Parameters

gens_df – A generators_eia860 dataframe containing at least the columns report_date, plant_id_eia, generator_id, energy_source_code_1, and technology_description.

Returns

A copy of the input dataframe, with technology_description filled in.

pudl.output.eia860.boiler_generator_assn_eia860(pudl_engine, start_date=None, end_date=None)[source]

Pull all fields from the EIA 860 boiler generator association table.

Parameters
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns

A DataFrame containing all the fields from the EIA 860 boiler generator association table.

Return type

pandas.DataFrame

pudl.output.eia860.ownership_eia860(pudl_engine, start_date=None, end_date=None)[source]

Pull a useful set of fields related to ownership_eia860 table.

Parameters
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns

A DataFrame containing a useful set of fields related to the EIA 860 Ownership table.

Return type

pandas.DataFrame

pudl.output.eia860.assign_unit_ids(gens_df)[source]

Group generators into operational units using various heuristics.

Splits a few columns off from the big generator dataframe and uses several heuristic functions to fill in missing unit_id_pudl values beyond those that are generated in the boiler generator association process. Then merges the new unit ID values back in to the generators dataframe.

Parameters

gens_df (pandas.DataFrame) – An EIA generator table. Must contain at least the columns: report_date, plant_id_eia, generator_id, unit_id_pudl, bga_source, fuel_type_code_pudl, prime_mover_code,

Returns

Returned dataframe should only vary from the input in that some NA values in the unit_id_pudl and bga_source columns have been filled in with real values.

Return type

pandas.DataFrame

Raises
  • ValueError – If the input dataframe is missing required columns.

  • ValueError – If any generator is associated with more than one unit_id_pudl.

  • AssertionError – If row or column indices are changed.

  • AssertionError – If pre-existing unit_id_pudl or bga_source values are altered.

  • AssertionError – If contents of any other columns are altered at all.

pudl.output.eia860.fill_unit_ids(gens_df)[source]

Back and forward fill Unit IDs for each plant / gen combination.

This routine assumes that the mapping of generators to units is constant over time, and extends those mappings into years where no boilers have been reported – since in the BGA we can only connect generators to each other if they are both connected to a boiler.

Prior to 2014, combined cycle units didn’t report any “boilers” but in latter years, they have been given “boilers” that correspond to their generators, so that all of their fuel consumption is recorded alongside that of other types of generators.

The bga_source field is set to “bfill_units” for those that were backfilled, and “ffill_units” for those that were forward filled.

Note: We could back/forward fill the boiler IDs prior to the BGA process and we ought to get consistent units across all the years that are the same as what we fill in here. We could also back/forward fill boiler IDs and Unit IDs after the fact, and we should get the same result. this will address many currently “boilerless” CCNG units that use generator ID as boiler ID in the latter years. We could try and apply this more generally, but in cases of generator IDs that haven’t been used as boiler IDs, it would break the foreign key relationship with the boiler table, unless we added them there too, which seems like too much deep muddling.

Parameters

gens_df (pandas.DataFrame) – An generators_eia860 dataframe, which must contain columns: report_date, plant_id_eia, generator_id, unit_id_pudl, bga_source.

Returns

with the same columns as the input dataframe, but having some NA values filled in for both the unit_id_pudl and bga_source columns.

Return type

pandas.DataFrame

pudl.output.eia860.max_unit_id_by_plant(gens_df)[source]

Identify the largest unit ID associated with each plant so we don’t overlap.

The PUDL Unit IDs are sequentially assigned integers. To assign a new ID, we need to know the largest existing Unit ID within a plant. This function calculates that largest existing ID, or uses zero, if no Unit IDs are set within the plant.

Note that this calculation depends on having all of the pre-existing generators and units still available in the dataframe!

Parameters

gens_df (pandas.DataFrame) – A generators_eia860 dataframe containing at least the columns plant_id_eia and unit_id_pudl.

Returns

Having two columns: plant_id_eia and max_unit_id_pudl in which each row should be unique.

Return type

pandas.DataFrame

pudl.output.eia860._append_masked_units(gens_df, row_mask, unit_ids, on)[source]

Replace rows with new PUDL Unit IDs in the original dataframe.

Merges the newly assigned Unit IDs found in unit_ids into the gens_df dataframe, but only for those rows which are selected by the boolean row_mask. Merges using the column or columns specified by on. This operation should only result in changes to the values of unit_id_pudl and bga_source in the output dataframe. All of gens_df, unit_ids and row_mask must be similarly indexed for this to work.

Parameters
  • gens_df (pandas.DataFrame) – a gens_eia860 based dataframe.

  • row_mask (boolean mask) – A boolean array indicating which records in gens_df should be replaced using values from unit_ids.

  • unit_ids (pandas.DataFrame) – A dataframe containing newly assigned unit_id_pudl values to be integrated into gens_df.

  • on (str or list) – Column or list of columns to merge on.

Return type

pandas.DataFrame

pudl.output.eia860.assign_single_gen_unit_ids(gens_df, prime_mover_codes, fuel_type_code_pudl=None, label_prefix='single')[source]

Assign a unique PUDL Unit ID to each generator of a given prime mover type.

Calculate the maximum pre-existing PUDL Unit ID within each plant, and assign each as of yet unidentified distinct generator within each plant with an incrementing integer unit_id_pudl, beginning with 1 + the previous maximum unit_id_pudl found in that plant. Mark that generator with a label in the bga_source column consisting of label_prefix + the prime mover code.

If fuel_type_code_pudl is not None, then only assign new Unit IDs to those generators having the specified fuel type code, and use that fuel type code as the label prefix, e.g. “coal_st” for a coal-fired steam turbine.

Only generators having NA unit_id_pudl will be assigned a new ID.

Parameters
  • gens_df (pandas.DataFrame) – A collection of EIA generator records. Must include the plant_id_eia, generator_id and prime_mover_code and unit_id_pudl columns.

  • prime_mover_codes (list) – List of prime mover codes for which we are attempting to assign simple Unit IDs.

  • fuel_type_code_pudl (str, None) – If not None, then limit the records assigned a unit_id to those that have the specified fuel_type_code_pudl (e.g. “coal”, “gas”, “oil”, “nuclear”)

  • label_prefix (str) – String to use in labeling records as to how their unit_id_pudl was set. Will be concatenated with the prime mover code.

Returns

A new dataframe with the same rows and columns as were passed in, but with the unit_id_pudl and bga_source columns updated to reflect the newly assigned Unit IDs.

Return type

pandas.DataFrame

pudl.output.eia860.assign_cc_unit_ids(gens_df)[source]

Assign PUDL Unit IDs for combined cycle generation units.

This applies only to combined cycle units reported as a combination of CT and CA prime movers. All CT and CA generators within a plant that do not already have a unit_id_pudl assigned will be given the same unit ID. The bga_source column is set to one of several flags indicating what type of arrangement was found:

  • orphan_ct (zero CA gens, 1+ CT gens)

  • orphan_ca (zero CT gens, 1+ CA gens)

  • one_ct_one_ca_inferred (1 CT, 1 CA)

  • one_ct_many_ca_inferred (1 CT, 1+ CA)

  • many_ct_one_ca_inferred (1+ CT, 1 CA)

  • many_ct_many_ca_inferred (1+ CT, 1+ CA)

Orphaned generators are still assigned a unit_id_pudl so that they can potentially be associated with other generators in the same unit across years. It’s likely that these orphans are a result of mislabled or missing generators. Note that as generators are added or removed over time, the flags associated with each generator may change, even though it remains part of the same inferred unit.

Returns

pandas.DataFrame

pudl.output.eia860.assign_prime_fuel_unit_ids(gens_df, prime_mover_code, fuel_type_code_pudl)[source]

Assign a PUDL Unit ID to all generators with a given prime mover and fuel.

Within each plant, assign a Unit ID to all generators that don’t have one, and that share the same fuel_type_code_pudl and prime_mover_code. This is especially useful for differentiating between different types of steam turbine generators, as there are so many different kinds of steam turbines, and the only characteristic we have to differentiate between them in this context is the fuel they consume. E.g. nuclear, geothermal, solar thermal, natural gas, diesel, and coal can all run steam turbines, but it doesn’t make sense to lump those turbines together into a single unit just because they are located at the same plant.

This routine only assigns a PUDL Unit ID to generators that have a consistently reported value of fuel_type_code_pudl across all of the years of data in gens_df. This consistency is important because otherwise the prime-fuel based unit assignment could put the same generator into different units in different years, which is currently not compatible with our concept of “units.”

Parameters
  • gens_df (pandas.DataFrame) – A collection of EIA generator records. Must include the plant_id_eia, generator_id and prime_mover_code and unit_id_pudl columns.

  • prime_mover_code (str) – List of prime mover codes for which we are attempting to assign simple Unit IDs.

  • fuel_type_code_pudl (str) – If not None, then limit the records assigned a unit_id to those that have the specified fuel_type_code_pudl (e.g. “coal”, “gas”, “oil”, “nuclear”)

Return type

pandas.DataFrame