pudl.output.eia923#

Functions for pulling EIA 923 data out of the PUDl DB.

Module Contents#

Functions#

generation_fuel_eia923(pudl_engine[, freq, ...])

Pull records from the generation_fuel_eia923 table in given date range.

generation_fuel_all_eia923(→ pandas.DataFrame)

Combine nuclear and non-nuclear generation fuel tables into a single output.

fuel_receipts_costs_eia923(→ pandas.DataFrame)

Pull records from fuel_receipts_costs_eia923 table in given date range.

boiler_fuel_eia923(pudl_engine[, freq, start_date, ...])

Pull records from the boiler_fuel_eia923 table in a given data range.

generation_eia923(pudl_engine[, freq, start_date, ...])

Pull records from the boiler_fuel_eia923 table in a given data range.

denorm_generation_eia923(g_df, pudl_engine, ...)

Denomralize generation_eia923 table.

make_url_cat_eiaapi(category_id)

Generate a url for a category from EIA's API.

make_url_series_eiaapi(series_id)

Generate a url for a series EIA's API.

_check_eia_api_key()

get_response(url)

Get a response from the API's url.

grab_fuel_state_monthly(cat_id)

Grab an API response for monthly fuel costs for one fuel category.

convert_cost_json_to_df(response_fuel_state_annual)

Convert a fuel-type/state response into a clean dataframe.

get_fuel_cost_avg_eiaapi(fuel_cost_cat_ids)

Get a dataframe of state-level average fuel costs for EIA's API.

Attributes#

logger

BASE_URL_EIA

FUEL_TYPE_EIAAPI_MAP

FUEL_COST_CATEGORIES_EIAAPI

The category ids for fuel costs by fuel for electricity for coal, gas and oil.

pudl.output.eia923.logger[source]#
pudl.output.eia923.BASE_URL_EIA = https://api.eia.gov/[source]#
pudl.output.eia923.FUEL_TYPE_EIAAPI_MAP[source]#
pudl.output.eia923.FUEL_COST_CATEGORIES_EIAAPI = [41696, 41762, 41740][source]#

The category ids for fuel costs by fuel for electricity for coal, gas and oil.

Each category id is a peice of a query to EIA’s API. Each query here contains a set of state-level child series which contain fuel cost data.

See EIA’s query browse here:
pudl.output.eia923.generation_fuel_eia923(pudl_engine, freq: Literal[AS, MS, None] = None, start_date: str | date | datetime | pd.Timestamp = None, end_date: str | date | datetime | pd.Timestamp = None, nuclear: bool = False)[source]#

Pull records from the generation_fuel_eia923 table in given date range.

Optionally, aggregate the records over some timescale – monthly, yearly, quarterly, etc. as well as by fuel type within a plant.

If the records are not being aggregated, all of the database fields are available. If they’re being aggregated, then we preserve the following fields. Per-unit values are re-calculated based on the aggregated totals. Totals are summed across whatever time range is being used, within a given plant and fuel type.

  • plant_id_eia

  • report_date

  • fuel_type_code_pudl

  • fuel_consumed_units

  • fuel_consumed_for_electricity_units

  • fuel_mmbtu_per_unit

  • fuel_consumed_mmbtu

  • fuel_consumed_for_electricity_mmbtu

  • net_generation_mwh

In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.

Parameters:
  • pudl_engine – SQLAlchemy connection engine for the PUDL DB.

  • freq – a pandas timeseries offset alias (either “MS” or “AS”) or None.

  • start_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • nuclear – If True, return generation_fuel_nuclear_eia923 table.

Returns:

A DataFrame containing records from the EIA 923 Generation Fuel table.

pudl.output.eia923.generation_fuel_all_eia923(gf: pandas.DataFrame, gfn: pandas.DataFrame) pandas.DataFrame[source]#

Combine nuclear and non-nuclear generation fuel tables into a single output.

The nuclear and non-nuclear generation fuel data are reported at different granularities. For non-nuclear generation, each row is a unique combination of date, plant ID, prime mover, and fuel type. Nuclear generation is additionally split out by nuclear_unit_id (which happens to be the same as generator_id).

This function aggregates the nuclear data across all nuclear units within a plant so that it is structurally the same as the non-nuclear data and can be treated identically in subsequent analyses. Then the nuclear and non-nuclear data are concatenated into a single dataframe and returned.

Parameters:
  • gf – non-nuclear generation fuel dataframe.

  • gfn – nuclear generation fuel dataframe.

pudl.output.eia923.fuel_receipts_costs_eia923(pudl_engine, freq: Literal[AS, MS, None] = None, start_date: str | date | datetime | pd.Timestamp = None, end_date: str | date | datetime | pd.Timestamp = None, fill: bool = False, roll: bool = False) pandas.DataFrame[source]#

Pull records from fuel_receipts_costs_eia923 table in given date range.

Optionally, aggregate the records at a monthly or longer timescale, as well as by fuel type within a plant, by setting freq to something other than the default None value.

If the records are not being aggregated, then all of the fields found in the PUDL database are available. If they are being aggregated, then the following fields are preserved, and appropriately summed or re-calculated based on the specified aggregation. In both cases, new total values are calculated, for total fuel heat content and total fuel cost.

  • plant_id_eia

  • report_date

  • fuel_type_code_pudl (formerly energy_source_simple)

  • fuel_received_units (sum)

  • fuel_cost_per_mmbtu (weighted average)

  • total_fuel_cost (sum)

  • fuel_consumed_mmbtu (sum)

  • fuel_mmbtu_per_unit (weighted average)

  • sulfur_content_pct (weighted average)

  • ash_content_pct (weighted average)

  • moisture_content_pct (weighted average)

  • mercury_content_ppm (weighted average)

  • chlorine_content_ppm (weighted average)

In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.

Optionally fill in missing fuel costs based on monthly state averages which are pulled from the EIA’s open data API, and/or use a rolling average to fill in gaps in the fuel costs. These behaviors are controlled by the fill and roll parameters. If you set fill=True you need to ensure that you have stored your API key in an environment variable named API_KEY_EIA. You can register for a free EIA API key here:

https://www.eia.gov/opendata/register.php

Parameters:
  • pudl_engine – SQLAlchemy connection engine for the PUDL DB.

  • freq – a pandas timeseries offset alias (“MS” or “AS”) or None. The original data is reported monthly.

  • start_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • fill – if set to True, fill in missing coal, gas and oil fuel cost per mmbtu from EIA’s API. This fills with montly state-level averages.

  • roll – if set to True, apply a rolling average to a subset of output table’s columns (currently only ‘fuel_cost_per_mmbtu’ for the frc table).

Returns:

A DataFrame containing records from the EIA 923 Fuel Receipts and Costs table.

pudl.output.eia923.boiler_fuel_eia923(pudl_engine, freq=None, start_date=None, end_date=None)[source]#

Pull records from the boiler_fuel_eia923 table in a given data range.

Optionally, aggregate the records over some timescale – monthly, yearly, quarterly, etc. as well as by fuel type within a plant.

If the records are not being aggregated, all of the database fields are available. If they’re being aggregated, then we preserve the following fields. Per-unit values are re-calculated based on the aggregated totals. Totals are summed across whatever time range is being used, within a given plant and fuel type.

  • fuel_consumed_units (sum)

  • fuel_mmbtu_per_unit (weighted average)

  • fuel_consumed_mmbtu (sum)

  • sulfur_content_pct (weighted average)

  • ash_content_pct (weighted average)

In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.

Parameters:
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • freq (str) – a pandas timeseries offset alias. The original data is reported monthly, so the best time frequencies to use here are probably month start (freq=’MS’) and year start (freq=’YS’).

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns:

A DataFrame containing all records from the EIA 923 Boiler Fuel table.

Return type:

pandas.DataFrame

pudl.output.eia923.generation_eia923(pudl_engine, freq=None, start_date=None, end_date=None)[source]#

Pull records from the boiler_fuel_eia923 table in a given data range.

Parameters:
  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • freq (str) – a pandas timeseries offset alias. The original data is reported monthly, so the best time frequencies to use here are probably month start (freq=’MS’) and year start (freq=’YS’).

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

Returns:

A DataFrame containing all records from the EIA 923 Generation table.

Return type:

pandas.DataFrame

pudl.output.eia923.denorm_generation_eia923(g_df, pudl_engine, start_date, end_date)[source]#

Denomralize generation_eia923 table.

Parameters:
  • g_df (pandas.DataFrame) – generation_eia923 table. Should have columns: [“plant_id_eia”, “generator_id”, “report_date”, “net_generation_mwh”]

  • pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.

  • start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

  • end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.

pudl.output.eia923.make_url_cat_eiaapi(category_id)[source]#

Generate a url for a category from EIA’s API.

Requires an environment variable named API_KEY_EIA be set, containing a valid EIA API key, which you can obtain from:

https://www.eia.gov/opendata/register.php

pudl.output.eia923.make_url_series_eiaapi(series_id)[source]#

Generate a url for a series EIA’s API.

Requires an environment variable named API_KEY_EIA be set, containing a valid EIA API key, which you can obtain from:

https://www.eia.gov/opendata/register.php

pudl.output.eia923._check_eia_api_key()[source]#
pudl.output.eia923.get_response(url)[source]#

Get a response from the API’s url.

pudl.output.eia923.grab_fuel_state_monthly(cat_id)[source]#

Grab an API response for monthly fuel costs for one fuel category.

The data we want from EIA is in monthly, state-level series for each fuel type. For each fuel category, there are at least 51 embeded child series. This function compiles one fuel type’s child categories into one request. The resulting api response should contain a list of series responses from each state which we can convert into a pandas.DataFrame using convert_cost_json_to_df.

Parameters:

cat_id (int) – category id for one fuel type. Known to be

pudl.output.eia923.convert_cost_json_to_df(response_fuel_state_annual)[source]#

Convert a fuel-type/state response into a clean dataframe.

Parameters:

response_fuel_state_annual (api response) – an EIA API response which contains state-level series including monthly fuel cost data.

Returns:

a dataframe containing state-level montly fuel cost. The table contains the following columns, some of which are refernce columns: ‘report_date’, ‘fuel_cost_per_unit’, ‘state’, ‘fuel_type_code_pudl’, ‘units’ (ref), ‘series_id’ (ref), ‘name’ (ref).

Return type:

pandas.DataFrame

pudl.output.eia923.get_fuel_cost_avg_eiaapi(fuel_cost_cat_ids)[source]#

Get a dataframe of state-level average fuel costs for EIA’s API.

Parameters:

fuel_cost_cat_ids (list) – list of category ids. Known/testing working ids are stored in FUEL_COST_CATEGORIES_EIAAPI.

Returns:

a dataframe containing state-level montly fuel cost. The table contains the following columns, some of which are refernce columns: ‘report_date’, ‘fuel_cost_per_unit’, ‘state’, ‘fuel_type_code_pudl’, ‘units’ (ref), ‘series_id’ (ref), ‘name’ (ref).

Return type:

pandas.DataFrame