pudl.output.eia923
Functions for pulling EIA 923 data out of the PUDl DB.
Module Contents
Functions
|
Pull records from the generation_fuel_eia923 table in given date range. |
|
Combine nuclear and non-nuclear generation fuel tables into a single output. |
|
Pull records from |
|
Pull records from the boiler_fuel_eia923 table in a given data range. |
|
Pull records from the boiler_fuel_eia923 table in a given data range. |
|
Denomralize generation_eia923 table. |
|
Generate a url for a category from EIA's API. |
|
Generate a url for a series EIA's API. |
|
Get a response from the API's url. |
|
Grab an API response for monthly fuel costs for one fuel category. |
|
Convert a fuel-type/state response into a clean dataframe. |
|
Get a dataframe of state-level average fuel costs for EIA's API. |
Attributes
The category ids for fuel costs by fuel for electricity for coal, gas and oil. |
- pudl.output.eia923.FUEL_COST_CATEGORIES_EIAAPI = [41696, 41762, 41740][source]
The category ids for fuel costs by fuel for electricity for coal, gas and oil.
Each category id is a peice of a query to EIA’s API. Each query here contains a set of state-level child series which contain fuel cost data.
- See EIA’s query browse here:
- pudl.output.eia923.generation_fuel_eia923(pudl_engine, freq: Literal[AS, MS, None] = None, start_date: Union[str, datetime.date, datetime.datetime, pandas.Timestamp] = None, end_date: Union[str, datetime.date, datetime.datetime, pandas.Timestamp] = None, nuclear: bool = False)[source]
Pull records from the generation_fuel_eia923 table in given date range.
Optionally, aggregate the records over some timescale – monthly, yearly, quarterly, etc. as well as by fuel type within a plant.
If the records are not being aggregated, all of the database fields are available. If they’re being aggregated, then we preserve the following fields. Per-unit values are re-calculated based on the aggregated totals. Totals are summed across whatever time range is being used, within a given plant and fuel type.
plant_id_eia
report_date
fuel_type_code_pudl
fuel_consumed_units
fuel_consumed_for_electricity_units
fuel_mmbtu_per_unit
fuel_consumed_mmbtu
fuel_consumed_for_electricity_mmbtu
net_generation_mwh
In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.
- Parameters
pudl_engine – SQLAlchemy connection engine for the PUDL DB.
freq – a pandas timeseries offset alias (either “MS” or “AS”) or None.
start_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
end_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
nuclear – If True, return generation_fuel_nuclear_eia923 table.
- Returns
A DataFrame containing records from the EIA 923 Generation Fuel table.
- pudl.output.eia923.generation_fuel_all_eia923(gf: pandas.DataFrame, gfn: pandas.DataFrame) pandas.DataFrame [source]
Combine nuclear and non-nuclear generation fuel tables into a single output.
The nuclear and non-nuclear generation fuel data are reported at different granularities. For non-nuclear generation, each row is a unique combination of date, plant ID, prime mover, and fuel type. Nuclear generation is additionally split out by nuclear_unit_id (which happens to be the same as generator_id).
This function aggregates the nuclear data across all nuclear units within a plant so that it is structurally the same as the non-nuclear data and can be treated identically in subsequent analyses. Then the nuclear and non-nuclear data are concatenated into a single dataframe and returned.
- Parameters
gf – non-nuclear generation fuel dataframe.
gfn – nuclear generation fuel dataframe.
- pudl.output.eia923.fuel_receipts_costs_eia923(pudl_engine, freq: Literal[AS, MS, None] = None, start_date: Union[str, datetime.date, datetime.datetime, pandas.Timestamp] = None, end_date: Union[str, datetime.date, datetime.datetime, pandas.Timestamp] = None, fill: bool = False, roll: bool = False) pandas.DataFrame [source]
Pull records from
fuel_receipts_costs_eia923
table in given date range.Optionally, aggregate the records at a monthly or longer timescale, as well as by fuel type within a plant, by setting freq to something other than the default None value.
If the records are not being aggregated, then all of the fields found in the PUDL database are available. If they are being aggregated, then the following fields are preserved, and appropriately summed or re-calculated based on the specified aggregation. In both cases, new total values are calculated, for total fuel heat content and total fuel cost.
plant_id_eia
report_date
fuel_type_code_pudl
(formerly energy_source_simple)fuel_received_units
(sum)fuel_cost_per_mmbtu
(weighted average)total_fuel_cost
(sum)fuel_consumed_mmbtu
(sum)fuel_mmbtu_per_unit
(weighted average)sulfur_content_pct
(weighted average)ash_content_pct
(weighted average)moisture_content_pct
(weighted average)mercury_content_ppm
(weighted average)chlorine_content_ppm
(weighted average)
In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.
Optionally fill in missing fuel costs based on monthly state averages which are pulled from the EIA’s open data API, and/or use a rolling average to fill in gaps in the fuel costs. These behaviors are controlled by the
fill
androll
parameters. If you setfill=True
you need to ensure that you have stored your API key in an environment variable namedAPI_KEY_EIA
. You can register for a free EIA API key here:https://www.eia.gov/opendata/register.php
- Parameters
pudl_engine – SQLAlchemy connection engine for the PUDL DB.
freq – a pandas timeseries offset alias (“MS” or “AS”) or None. The original data is reported monthly.
start_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
end_date – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
fill – if set to True, fill in missing coal, gas and oil fuel cost per mmbtu from EIA’s API. This fills with montly state-level averages.
roll – if set to True, apply a rolling average to a subset of output table’s columns (currently only ‘fuel_cost_per_mmbtu’ for the frc table).
- Returns
A DataFrame containing records from the EIA 923 Fuel Receipts and Costs table.
- pudl.output.eia923.boiler_fuel_eia923(pudl_engine, freq=None, start_date=None, end_date=None)[source]
Pull records from the boiler_fuel_eia923 table in a given data range.
Optionally, aggregate the records over some timescale – monthly, yearly, quarterly, etc. as well as by fuel type within a plant.
If the records are not being aggregated, all of the database fields are available. If they’re being aggregated, then we preserve the following fields. Per-unit values are re-calculated based on the aggregated totals. Totals are summed across whatever time range is being used, within a given plant and fuel type.
fuel_consumed_units
(sum)fuel_mmbtu_per_unit
(weighted average)fuel_consumed_mmbtu
(sum)sulfur_content_pct
(weighted average)ash_content_pct
(weighted average)
In addition, plant and utility names and IDs are pulled in from the EIA 860 tables.
- Parameters
pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.
freq (str) – a pandas timeseries offset alias. The original data is reported monthly, so the best time frequencies to use here are probably month start (freq=’MS’) and year start (freq=’YS’).
start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
- Returns
A DataFrame containing all records from the EIA 923 Boiler Fuel table.
- Return type
- pudl.output.eia923.generation_eia923(pudl_engine, freq=None, start_date=None, end_date=None)[source]
Pull records from the boiler_fuel_eia923 table in a given data range.
- Parameters
pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.
freq (str) – a pandas timeseries offset alias. The original data is reported monthly, so the best time frequencies to use here are probably month start (freq=’MS’) and year start (freq=’YS’).
start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
- Returns
A DataFrame containing all records from the EIA 923 Generation table.
- Return type
- pudl.output.eia923.denorm_generation_eia923(g_df, pudl_engine, start_date, end_date)[source]
Denomralize generation_eia923 table.
- Parameters
g_df (pandas.DataFrame) – generation_eia923 table. Should have columns: [“plant_id_eia”, “generator_id”, “report_date”, “net_generation_mwh”]
pudl_engine (sqlalchemy.engine.Engine) – SQLAlchemy connection engine for the PUDL DB.
start_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
end_date (date-like) – date-like object, including a string of the form ‘YYYY-MM-DD’ which will be used to specify the date range of records to be pulled. Dates are inclusive.
- pudl.output.eia923.make_url_cat_eiaapi(category_id)[source]
Generate a url for a category from EIA’s API.
Requires an environment variable named
API_KEY_EIA
be set, containing a valid EIA API key, which you can obtain from:
- pudl.output.eia923.make_url_series_eiaapi(series_id)[source]
Generate a url for a series EIA’s API.
Requires an environment variable named
API_KEY_EIA
be set, containing a valid EIA API key, which you can obtain from:
- pudl.output.eia923.grab_fuel_state_monthly(cat_id)[source]
Grab an API response for monthly fuel costs for one fuel category.
The data we want from EIA is in monthly, state-level series for each fuel type. For each fuel category, there are at least 51 embeded child series. This function compiles one fuel type’s child categories into one request. The resulting api response should contain a list of series responses from each state which we can convert into a pandas.DataFrame using convert_cost_json_to_df.
- Parameters
cat_id (int) – category id for one fuel type. Known to be
- pudl.output.eia923.convert_cost_json_to_df(response_fuel_state_annual)[source]
Convert a fuel-type/state response into a clean dataframe.
- Parameters
response_fuel_state_annual (api response) – an EIA API response which contains state-level series including monthly fuel cost data.
- Returns
a dataframe containing state-level montly fuel cost. The table contains the following columns, some of which are refernce columns: ‘report_date’, ‘fuel_cost_per_unit’, ‘state’, ‘fuel_type_code_pudl’, ‘units’ (ref), ‘series_id’ (ref), ‘name’ (ref).
- Return type
- pudl.output.eia923.get_fuel_cost_avg_eiaapi(fuel_cost_cat_ids)[source]
Get a dataframe of state-level average fuel costs for EIA’s API.
- Parameters
fuel_cost_cat_ids (list) – list of category ids. Known/testing working ids are stored in FUEL_COST_CATEGORIES_EIAAPI.
- Returns
a dataframe containing state-level montly fuel cost. The table contains the following columns, some of which are refernce columns: ‘report_date’, ‘fuel_cost_per_unit’, ‘state’, ‘fuel_type_code_pudl’, ‘units’ (ref), ‘series_id’ (ref), ‘name’ (ref).
- Return type