pudl.output.eia923#

Functions for pulling EIA 923 data out of the PUDl DB.

Module Contents#

Functions#

generation_fuel_eia923(pudl_engine[, freq, ...])

Pull records from the generation_fuel_eia923 table in given date range.

generation_fuel_all_eia923(→ pandas.DataFrame)

Combine nuclear and non-nuclear generation fuel tables into a single output.

fuel_receipts_costs_eia923(→ pandas.DataFrame)

Pull records from fuel_receipts_costs_eia923 table in given date range.

boiler_fuel_eia923(pudl_engine[, freq, start_date, ...])

Pull records from the boiler_fuel_eia923 table in a given data range.

generation_eia923(pudl_engine[, freq, start_date, ...])

Pull records from the boiler_fuel_eia923 table in a given data range.

denorm_generation_eia923(g_df, pudl_engine, ...)

Denomralize generation_eia923 table.

get_fuel_cost_avg_bulk_elec(→ pandas.DataFrame)

Get state-level average fuel costs from EIA's bulk electricity data.

_impute_via_bulk_elec(→ pandas.DataFrame)

Attributes#

pudl.output.eia923.logger[source]#
pudl.output.eia923.generation_fuel_eia923(pudl_engine, freq: Literal[AS, MS, None] = None, start_date: str | date | datetime | pd.Timestamp = None, end_date: str | date | datetime | pd.Timestamp = None, nuclear: bool = False)[source]#

Pull records from the generation_fuel_eia923 table in given date range.

Optionally, aggregate the records over some timescale – monthly, yearly, quarterly, etc. as well as by fuel type within a plant.

If the records are not being aggregated, all of the database fields are available. If they’re being aggregated, then we preserve the following fields. Per-unit values are re-calculated based on the aggregated totals. Totals are summed across whatever time range is being used, within a given plant and fuel type.

  • plant_id_eia

  • report_date

  • fuel_type_code_pudl

  • fuel_consumed_units

  • fuel_consumed_for_electricity_units

  • fuel_mmbtu_per_unit

  • fuel_consumed_mmbtu

  • fuel_consumed_for_electricity_mmbtu

  • net_generation_mwh

In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.

Parameters:
  • pudl_engine – SQLAlchemy connection engine for the PUDL DB.

  • freq – a pandas timeseries offset alias (either “MS” or “AS”) or None.

  • start_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • nuclear – If True, return generation_fuel_nuclear_eia923 table.

Returns:

A DataFrame containing records from the EIA 923 Generation Fuel table.

pudl.output.eia923.generation_fuel_all_eia923(gf: pandas.DataFrame, gfn: pandas.DataFrame) pandas.DataFrame[source]#

Combine nuclear and non-nuclear generation fuel tables into a single output.

The nuclear and non-nuclear generation fuel data are reported at different granularities. For non-nuclear generation, each row is a unique combination of date, plant ID, prime mover, and fuel type. Nuclear generation is additionally split out by nuclear_unit_id (which happens to be the same as generator_id).

This function aggregates the nuclear data across all nuclear units within a plant so that it is structurally the same as the non-nuclear data and can be treated identically in subsequent analyses. Then the nuclear and non-nuclear data are concatenated into a single dataframe and returned.

Parameters:
  • gf – non-nuclear generation fuel dataframe.

  • gfn – nuclear generation fuel dataframe.

pudl.output.eia923.fuel_receipts_costs_eia923(pudl_engine, freq: Literal[AS, MS, None] = None, start_date: str | date | datetime | pd.Timestamp = None, end_date: str | date | datetime | pd.Timestamp = None, fill: bool = False, roll: bool = False) pandas.DataFrame[source]#

Pull records from fuel_receipts_costs_eia923 table in given date range.

Optionally, aggregate the records at a monthly or longer timescale, as well as by fuel type within a plant, by setting freq to something other than the default None value.

If the records are not being aggregated, then all of the fields found in the PUDL database are available. If they are being aggregated, then the following fields are preserved, and appropriately summed or re-calculated based on the specified aggregation. In both cases, new total values are calculated, for total fuel heat content and total fuel cost.

  • plant_id_eia

  • report_date

  • fuel_type_code_pudl (formerly energy_source_simple)

  • fuel_received_units (sum)

  • fuel_cost_per_mmbtu (weighted average)

  • total_fuel_cost (sum)

  • fuel_consumed_mmbtu (sum)

  • fuel_mmbtu_per_unit (weighted average)

  • sulfur_content_pct (weighted average)

  • ash_content_pct (weighted average)

  • moisture_content_pct (weighted average)

  • mercury_content_ppm (weighted average)

  • chlorine_content_ppm (weighted average)

In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.

Optionally fill in missing fuel costs based on monthly state averages from the EIA’s bulk electricity data, and/or use a rolling average to fill in gaps in the fuel costs. These behaviors are controlled by the fill and roll parameters.

Parameters:
  • pudl_engine – SQLAlchemy connection engine for the PUDL DB.

  • freq – a pandas timeseries offset alias (“MS” or “AS”) or None. The original data is reported monthly.

  • start_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • fill – if set to True, fill in missing coal, gas and oil fuel cost per mmbtu from EIA’s bulk data. This fills with montly state-level averages.

  • roll – if set to True, apply a rolling average to a subset of output table’s columns (currently only ‘fuel_cost_per_mmbtu’ for the frc table).

Returns:

A DataFrame containing records from the EIA 923 Fuel Receipts and Costs table.

pudl.output.eia923.boiler_fuel_eia923(pudl_engine, freq=None, start_date=None, end_date=None)[source]#

Pull records from the boiler_fuel_eia923 table in a given data range.

Optionally, aggregate the records over some timescale – monthly, yearly, quarterly, etc. as well as by fuel type within a plant.

If the records are not being aggregated, all of the database fields are available. If they’re being aggregated, then we preserve the following fields. Per-unit values are re-calculated based on the aggregated totals. Totals are summed across whatever time range is being used, within a given plant and fuel type.

  • fuel_consumed_units (sum)

  • fuel_mmbtu_per_unit (weighted average)

  • fuel_consumed_mmbtu (sum)

  • sulfur_content_pct (weighted average)

  • ash_content_pct (weighted average)

In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.

Parameters:
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • freq (str) – a pandas timeseries offset alias. The original data is reported monthly, so the best time frequencies to use here are probably month start (freq=’MS’) and year start (freq=’YS’).

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns:

A DataFrame containing all records from the EIA 923 Boiler Fuel table.

Return type:

pandas.DataFrame

pudl.output.eia923.generation_eia923(pudl_engine, freq=None, start_date=None, end_date=None)[source]#

Pull records from the boiler_fuel_eia923 table in a given data range.

Parameters:
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • freq (str) – a pandas timeseries offset alias. The original data is reported monthly, so the best time frequencies to use here are probably month start (freq=’MS’) and year start (freq=’YS’).

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns:

A DataFrame containing all records from the EIA 923 Generation table.

Return type:

pandas.DataFrame

pudl.output.eia923.denorm_generation_eia923(g_df, pudl_engine, start_date, end_date)[source]#

Denomralize generation_eia923 table.

Parameters:
  • g_df (pandas.DataFrame) – generation_eia923 table. Should have columns: [“plant_id_eia”, “generator_id”, “report_date”, “net_generation_mwh”]

  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

pudl.output.eia923.get_fuel_cost_avg_bulk_elec(pudl_engine: sqlalchemy.engine.Engine) pandas.DataFrame[source]#

Get state-level average fuel costs from EIA’s bulk electricity data.

This table is intended for use in fuel_receipts_costs_eia923() as a drop in replacement for a previous process that fetched data from the unreliable EIA API.

Parameters:

pudl_engine – SQLAlchemy connection engine for the PUDL DB.

Returns:

a dataframe containing state-level montly fuel cost. The table contains the following columns, some of which are refernce columns: ‘report_date’, ‘fuel_cost_per_mmbtu’, ‘state’, ‘fuel_type_code_pudl’

Return type:

pandas.DataFrame

pudl.output.eia923._impute_via_bulk_elec(frc_df: pandas.DataFrame, pudl_engine: sqlalchemy.engine.Engine) pandas.DataFrame[source]#