`pudl.analysis.state_demand`

Predict state-level electricity demand.

Using hourly electricity demand reported at the balancing authority and utility level in the FERC 714, and service territories for utilities and balancing autorities inferred from the counties served by each utility, and the utilities that make up each balancing authority in the EIA 861, estimate the total hourly electricity demand for each US state.

This analysis uses the total electricity sales by state reported in the EIA 861 as a scaling factor to ensure that the magnitude of electricity sales is roughly correct, and obtains the shape of the demand curve from the hourly planning area demand reported in the FERC 714.

The compilation of historical service territories based on the EIA 861 data is somewhat manual and could certainly be improved, but overall the results seem reasonable. Additional predictive spatial variables will be required to obtain more granular electricity demand estimates (e.g. at the county level).

Currently the script takes no arguments and simply runs a predefined analysis across all states and all years for which both EIA 861 and FERC 714 data are available, and outputs the results as a CSV in PUDL_DIR/local/state-demand/demand.csv

Module Contents

Functions

`lookup_state`(state: Union[str, int]) → dict	Lookup US state by state identifier.
`local_to_utc`(local: pandas.Series, tz: Iterable, **kwargs: Any) → pandas.Series	Convert local times to UTC.
`utc_to_local`(utc: pandas.Series, tz: Iterable) → pandas.Series	Convert UTC times to local.
`load_ventyx_hourly_state_demand`(path: str) → pandas.DataFrame	Read and format Ventyx hourly state-level demand.
`load_ferc714_hourly_demand_matrix`(pudl_out: pudl.output.pudltabl.PudlTabl) → Tuple[pandas.DataFrame, pandas.DataFrame]	Read and format FERC 714 hourly demand into matrix form.
`clean_ferc714_hourly_demand_matrix`(df: pandas.DataFrame) → pandas.DataFrame	Detect and null anomalous values in FERC 714 hourly demand matrix.
`filter_ferc714_hourly_demand_matrix`(df: pandas.DataFrame, min_data: int = 100, min_data_fraction: float = 0.9) → pandas.DataFrame	Filter incomplete years from FERC 714 hourly demand matrix.
`impute_ferc714_hourly_demand_matrix`(df: pandas.DataFrame) → pandas.DataFrame	Impute null values in FERC 714 hourly demand matrix.
`melt_ferc714_hourly_demand_matrix`(df: pandas.DataFrame, tz: pandas.DataFrame) → pandas.DataFrame	Melt FERC 714 hourly demand matrix to long format.
`load_ferc714_county_assignments`(pudl_out: pudl.output.pudltabl.PudlTabl) → pandas.DataFrame	Load FERC 714 county assignments.
`load_counties`(pudl_out: pudl.output.pudltabl.PudlTabl, pudl_settings: dict) → pandas.DataFrame	Load county attributes.
`load_eia861_state_total_sales`(pudl_out: pudl.output.pudltabl.PudlTabl) → pandas.DataFrame	Read and format EIA 861 sales by state and year.
`predict_state_hourly_demand`(demand: pandas.DataFrame, counties: pandas.DataFrame, assignments: pandas.DataFrame, state_totals: pandas.DataFrame = None, mean_overlaps: bool = False) → pandas.DataFrame	Predict state hourly demand.
`plot_demand_timeseries`(a: pandas.DataFrame, b: pandas.DataFrame = None, window: int = 168, title: str = None, path: str = None) → None	Make a timeseries plot of predicted and reference demand.
`plot_demand_scatter`(a: pandas.DataFrame, b: pandas.DataFrame, title: str = None, path: str = None) → None	Make a scatter plot comparing predicted and reference demand.
`compare_state_demand`(a: pandas.DataFrame, b: pandas.DataFrame, scaled: bool = True) → pandas.DataFrame	Compute statistics comparing predicted and reference demand.
`parse_command_line`(argv)	Skeletal command line argument parser to provide a help message.
`main`()	Predict state demand.

Attributes

`logger`
`STATES`	Attributes of US states and territories.
`STANDARD_UTC_OFFSETS`	Hour offset from Coordinated Universal Time (UTC) by time zone.
`UTC_OFFSETS`	Hour offset from Coordinated Universal Time (UTC) by time zone.

pudl.analysis.state_demand.logger[source]

pudl.analysis.state_demand.STATES :List[Dict[str, Union[str, int]]][source]

Attributes of US states and territories.

name (str): Full name.
code (str): US Postal Service (USPS) two-letter alphabetic code.
fips (int): Federal Information Processing Standard (FIPS) code.

pudl.analysis.state_demand.STANDARD_UTC_OFFSETS :Dict[str, str][source]

Hour offset from Coordinated Universal Time (UTC) by time zone.

Time zones are canonical names (e.g. ‘America/Denver’) from tzdata (https://www.iana.org/time-zones) mapped to their standard-time UTC offset.

pudl.analysis.state_demand.UTC_OFFSETS :Dict[str, int][source]

Hour offset from Coordinated Universal Time (UTC) by time zone.

Time zones are either standard or daylight-savings time zone abbreviations (e.g. ‘MST’).

pudl.analysis.state_demand.lookup_state(state: Union[str, int]) → dict[source]

Lookup US state by state identifier.

Parameters: state – State name, two-letter abbreviation, or FIPS code. String matching is case-insensitive.
Returns: State identifers.

Examples

>>> lookup_state('alabama')
{'name': 'Alabama', 'code': 'AL', 'fips': '01'}
>>> lookup_state('AL')
{'name': 'Alabama', 'code': 'AL', 'fips': '01'}
>>> lookup_state(1)
{'name': 'Alabama', 'code': 'AL', 'fips': '01'}

pudl.analysis.state_demand.local_to_utc(local: pandas.Series, tz: Iterable, **kwargs: Any) → pandas.Series[source]

Convert local times to UTC.

Parameters

local – Local times (tz-naive datetime64[ns]).
tz – For each time, a timezone (see DatetimeIndex.tz_localize()) or UTC offset in hours (int or float).
kwargs – Optional arguments to DatetimeIndex.tz_localize().

Returns

UTC times (tz-naive datetime64[ns]).

Examples

>>> s = pd.Series([pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1)])
>>> local_to_utc(s, [-7, -6])
0   2020-01-01 07:00:00
1   2020-01-01 06:00:00
dtype: datetime64[ns]
>>> local_to_utc(s, ['America/Denver', 'America/Chicago'])
0   2020-01-01 07:00:00
1   2020-01-01 06:00:00
dtype: datetime64[ns]

pudl.analysis.state_demand.utc_to_local(utc: pandas.Series, tz: Iterable) → pandas.Series[source]

Convert UTC times to local.

Parameters

utc – UTC times (tz-naive datetime64[ns] or datetime64[ns, UTC]).
tz – For each time, a timezone (see DatetimeIndex.tz_localize()) or UTC offset in hours (int or float).

Returns

Local times (tz-naive datetime64[ns]).

Examples

>>> s = pd.Series([pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1)])
>>> utc_to_local(s, [-7, -6])
0   2019-12-31 17:00:00
1   2019-12-31 18:00:00
dtype: datetime64[ns]
>>> utc_to_local(s, ['America/Denver', 'America/Chicago'])
0   2019-12-31 17:00:00
1   2019-12-31 18:00:00
dtype: datetime64[ns]

pudl.analysis.state_demand.load_ventyx_hourly_state_demand(path: str) → pandas.DataFrame[source]

Read and format Ventyx hourly state-level demand.

After manual corrections of the listed time zone, ambiguous time zone issues remain. Below is a list of transmission zones (by Transmission Zone ID) with one or more missing timestamps at transitions to or from daylight-savings:

615253 (Indiana)
615261 (Michigan)
615352 (Wisconsin)
615357 (Missouri)
615377 (Saskatchewan)
615401 (Minnesota, Wisconsin)
615516 (Missouri)
615529 (Oklahoma)
615603 (Idaho, Washington)
1836089 (California)

Parameters: path – Path to the data file (published as ‘state_level_load_2007_2018.csv’).
Returns: Dataframe with hourly state-level demand. * state_id_fips: FIPS code of US state. * utc_datetime: UTC time of the start of each hour. * demand_mwh: Hourly demand in MWh.

pudl.analysis.state_demand.load_ferc714_hourly_demand_matrix(pudl_out: pudl.output.pudltabl.PudlTabl) → Tuple[pandas.DataFrame, pandas.DataFrame][source]

Read and format FERC 714 hourly demand into matrix form.

Parameters: pudl_out – Used to access pudl.output.pudltabl.PudlTabl.demand_hourly_pa_ferc714().
Returns: Hourly demand as a matrix with a datetime row index (e.g. ‘2006-01-01 00:00:00’, …, ‘2019-12-31 23:00:00’) in local time ignoring daylight-savings, and a respondent_id_ferc714 column index (e.g. 101, …, 329). A second Dataframe lists the UTC offset in hours of each respondent_id_ferc714 and reporting year (int).

pudl.analysis.state_demand.clean_ferc714_hourly_demand_matrix(df: pandas.DataFrame) → pandas.DataFrame[source]

Detect and null anomalous values in FERC 714 hourly demand matrix.

Note

Takes about 10 minutes.

Parameters: df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().
Returns: Copy of df with nulled anomalous values.

pudl.analysis.state_demand.filter_ferc714_hourly_demand_matrix(df: pandas.DataFrame, min_data: int = 100, min_data_fraction: float = 0.9) → pandas.DataFrame[source]

Filter incomplete years from FERC 714 hourly demand matrix.

Nulls respondent-years with too few data and drops respondents with no data across all years.

Parameters

df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().
min_data – Minimum number of non-null hours in a year.
min_data_fraction – Minimum fraction of non-null hours between the first and last non-null hour in a year.

Returns

Hourly demand matrix df modified in-place.

pudl.analysis.state_demand.impute_ferc714_hourly_demand_matrix(df: pandas.DataFrame) → pandas.DataFrame[source]

Impute null values in FERC 714 hourly demand matrix.

Imputation is performed separately for each year, with only the respondents reporting data in that year.

Note

Takes about 15 minutes.

Parameters: df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().
Returns: Copy of df with imputed values.

pudl.analysis.state_demand.melt_ferc714_hourly_demand_matrix(df: pandas.DataFrame, tz: pandas.DataFrame) → pandas.DataFrame[source]

Melt FERC 714 hourly demand matrix to long format.

Parameters

df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().
tz – FERC 714 respondent time zones, as described in load_ferc714_hourly_demand_matrix().

Returns

Long-format hourly demand with columns respondent_id_ferc714, report year (int), utc_datetime, and demand_mwh.

pudl.analysis.state_demand.load_ferc714_county_assignments(pudl_out: pudl.output.pudltabl.PudlTabl) → pandas.DataFrame[source]

Load FERC 714 county assignments.

Parameters: pudl_out – PUDL database extractor.
Returns: Dataframe with columns respondent_id_ferc714, report year (int), and county_id_fips.

pudl.analysis.state_demand.load_counties(pudl_out: pudl.output.pudltabl.PudlTabl, pudl_settings: dict) → pandas.DataFrame[source]

Load county attributes.

Parameters

pudl_out – PUDL database extractor.
pudl_settings – PUDL settings.

Returns

Dataframe with columns county_id_fips and population.

pudl.analysis.state_demand.load_eia861_state_total_sales(pudl_out: pudl.output.pudltabl.PudlTabl) → pandas.DataFrame[source]

Read and format EIA 861 sales by state and year.

Parameters: pudl_out – Used to access pudl.output.pudltabl.PudlTabl.sales_eia861().
Returns: Dataframe with columns state_id_fips, year, demand_mwh.

pudl.analysis.state_demand.predict_state_hourly_demand(demand: pandas.DataFrame, counties: pandas.DataFrame, assignments: pandas.DataFrame, state_totals: pandas.DataFrame = None, mean_overlaps: bool = False) → pandas.DataFrame[source]

Predict state hourly demand.

Parameters

demand – Hourly demand timeseries, with columns respondent_id_ferc714, report year, utc_datetime, and demand_mwh.
counties – Counties, with columns county_id_fips and population.
assignments – County assignments for demand respondents, with columns respondent_id_ferc714, year, and county_id_fips.
state_totals – Total annual demand by state, with columns state_id_fips, year, and demand_mwh. If provided, the predicted hourly demand is scaled to match these totals.
mean_overlaps – Whether to mean the demands predicted for a county in cases when a county is assigned to multiple respondents. By default, demands are summed.

Returns

Dataframe with columns state_id_fips, utc_datetime, demand_mwh, and (if state_totals was provided) scaled_demand_mwh.

pudl.analysis.state_demand.plot_demand_timeseries(a: pandas.DataFrame, b: pandas.DataFrame = None, window: int = 168, title: str = None, path: str = None) → None[source]

Make a timeseries plot of predicted and reference demand.

Parameters

a – Predicted demand with columns utc_datetime and any of demand_mwh (in grey) and scaled_demand_mwh (in orange).
b – Reference demand with columns utc_datetime and demand_mwh (in red).
window – Width of window (in rows) to use to compute rolling means, or None to plot raw values.
title – Plot title.
path – Plot path. If provided, the figure is saved to file and closed.

pudl.analysis.state_demand.plot_demand_scatter(a: pandas.DataFrame, b: pandas.DataFrame, title: str = None, path: str = None) → None[source]

Make a scatter plot comparing predicted and reference demand.

Parameters

a – Predicted demand with columns utc_datetime and any of demand_mwh (in grey) and scaled_demand_mwh (in orange).
b – Reference demand with columns utc_datetime and demand_mwh. Every element in utc_datetime must match the one in a.
title – Plot title.
path – Plot path. If provided, the figure is saved to file and closed.

Raises

ValueError – Datetime columns do not match.

pudl.analysis.state_demand.compare_state_demand(a: pandas.DataFrame, b: pandas.DataFrame, scaled: bool = True) → pandas.DataFrame[source]

Compute statistics comparing predicted and reference demand.

Statistics are computed for each year.

Parameters

a – Predicted demand with columns utc_datetime and either demand_mwh (if scaled=False) or `scaled_demand_mwh (if scaled=True).
b – Reference demand with columns utc_datetime and demand_mwh. Every element in utc_datetime must match the one in a.

Returns

Dataframe with columns year, rmse (root mean square error), and mae (mean absolute error).

Raises

ValueError – Datetime columns do not match.

pudl.analysis.state_demand.parse_command_line(argv)[source]: Skeletal command line argument parser to provide a help message.

pudl.analysis.state_demand.main()[source]: Predict state demand.

pudl.analysis.state_demand

Module Contents

Functions

Attributes

`pudl.analysis.state_demand`