pudl.analysis.state_demand module

Predict state-level electricity demand.

pudl.analysis.state_demand.STANDARD_UTC_OFFSETS: Dict[str, str] = {'America/Anchorage': -9, 'America/Chicago': -6, 'America/Denver': -7, 'America/Halifax': -4, 'America/Los_Angeles': -8, 'America/New_York': -5, 'Pacific/Honolulu': -10}

Hour offset from Coordinated Universal Time (UTC) by time zone.

Time zones are canonical names (e.g. ‘America/Denver’) from tzdata (https://www.iana.org/time-zones) mapped to their standard-time UTC offset.

pudl.analysis.state_demand.STATES: List[Dict[str, Union[int, str]]] = [{'name': 'Alabama', 'code': 'AL', 'fips': '01'}, {'name': 'Alaska', 'code': 'AK', 'fips': '02'}, {'name': 'Arizona', 'code': 'AZ', 'fips': '04'}, {'name': 'Arkansas', 'code': 'AR', 'fips': '05'}, {'name': 'California', 'code': 'CA', 'fips': '06'}, {'name': 'Colorado', 'code': 'CO', 'fips': '08'}, {'name': 'Connecticut', 'code': 'CT', 'fips': '09'}, {'name': 'Delaware', 'code': 'DE', 'fips': '10'}, {'name': 'District of Columbia', 'code': 'DC', 'fips': '11'}, {'name': 'Florida', 'code': 'FL', 'fips': '12'}, {'name': 'Georgia', 'code': 'GA', 'fips': '13'}, {'name': 'Hawaii', 'code': 'HI', 'fips': '15'}, {'name': 'Idaho', 'code': 'ID', 'fips': '16'}, {'name': 'Illinois', 'code': 'IL', 'fips': '17'}, {'name': 'Indiana', 'code': 'IN', 'fips': '18'}, {'name': 'Iowa', 'code': 'IA', 'fips': '19'}, {'name': 'Kansas', 'code': 'KS', 'fips': '20'}, {'name': 'Kentucky', 'code': 'KY', 'fips': '21'}, {'name': 'Louisiana', 'code': 'LA', 'fips': '22'}, {'name': 'Maine', 'code': 'ME', 'fips': '23'}, {'name': 'Maryland', 'code': 'MD', 'fips': '24'}, {'name': 'Massachusetts', 'code': 'MA', 'fips': '25'}, {'name': 'Michigan', 'code': 'MI', 'fips': '26'}, {'name': 'Minnesota', 'code': 'MN', 'fips': '27'}, {'name': 'Mississippi', 'code': 'MS', 'fips': '28'}, {'name': 'Missouri', 'code': 'MO', 'fips': '29'}, {'name': 'Montana', 'code': 'MT', 'fips': '30'}, {'name': 'Nebraska', 'code': 'NE', 'fips': '31'}, {'name': 'Nevada', 'code': 'NV', 'fips': '32'}, {'name': 'New Hampshire', 'code': 'NH', 'fips': '33'}, {'name': 'New Jersey', 'code': 'NJ', 'fips': '34'}, {'name': 'New Mexico', 'code': 'NM', 'fips': '35'}, {'name': 'New York', 'code': 'NY', 'fips': '36'}, {'name': 'North Carolina', 'code': 'NC', 'fips': '37'}, {'name': 'North Dakota', 'code': 'ND', 'fips': '38'}, {'name': 'Ohio', 'code': 'OH', 'fips': '39'}, {'name': 'Oklahoma', 'code': 'OK', 'fips': '40'}, {'name': 'Oregon', 'code': 'OR', 'fips': '41'}, {'name': 'Pennsylvania', 'code': 'PA', 'fips': '42'}, {'name': 'Rhode Island', 'code': 'RI', 'fips': '44'}, {'name': 'South Carolina', 'code': 'SC', 'fips': '45'}, {'name': 'South Dakota', 'code': 'SD', 'fips': '46'}, {'name': 'Tennessee', 'code': 'TN', 'fips': '47'}, {'name': 'Texas', 'code': 'TX', 'fips': '48'}, {'name': 'Utah', 'code': 'UT', 'fips': '49'}, {'name': 'Vermont', 'code': 'VT', 'fips': '50'}, {'name': 'Virginia', 'code': 'VA', 'fips': '51'}, {'name': 'Washington', 'code': 'WA', 'fips': '53'}, {'name': 'West Virginia', 'code': 'WV', 'fips': '54'}, {'name': 'Wisconsin', 'code': 'WI', 'fips': '55'}, {'name': 'Wyoming', 'code': 'WY', 'fips': '56'}, {'name': 'American Samoa', 'code': 'AS', 'fips': '60'}, {'name': 'Guam', 'code': 'GU', 'fips': '66'}, {'name': 'Northern Mariana Islands', 'code': 'MP', 'fips': '69'}, {'name': 'Puerto Rico', 'code': 'PR', 'fips': '72'}, {'name': 'Virgin Islands', 'code': 'VI', 'fips': '78'}]

Attributes of US states and territories.

  • name (str): Full name.

  • code (str): US Postal Service (USPS) two-letter alphabetic code.

  • fips (int): Federal Information Processing Standard (FIPS) code.

pudl.analysis.state_demand.UTC_OFFSETS: Dict[str, int] = {'ADT': -3, 'AKDT': -8, 'AKST': -9, 'AST': -4, 'CDT': -5, 'CST': -6, 'EDT': -4, 'EST': -5, 'HST': -10, 'MDT': -6, 'MST': -7, 'PDT': -7, 'PST': -8}

Hour offset from Coordinated Universal Time (UTC) by time zone.

Time zones are either standard or daylight-savings time zone abbreviations (e.g. ‘MST’).

pudl.analysis.state_demand.clean_ferc714_hourly_demand_matrix(df: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Detect and null anomalous values in FERC 714 hourly demand matrix.

Note

Takes about 10 minutes.

Parameters

df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().

Returns

Copy of df with nulled anomalous values.

pudl.analysis.state_demand.compare_state_demand(a: pandas.core.frame.DataFrame, b: pandas.core.frame.DataFrame, scaled: bool = True)pandas.core.frame.DataFrame[source]

Compute statistics comparing predicted and reference demand.

Statistics are computed for each year.

Parameters
  • a – Predicted demand with columns utc_datetime and either demand_mwh (if scaled=False) or `scaled_demand_mwh (if scaled=True).

  • b – Reference demand with columns utc_datetime and demand_mwh. Every element in utc_datetime must match the one in a.

Returns

Dataframe with columns year, rmse (root mean square error), and mae (mean absolute error).

Raises

ValueError – Datetime columns do not match.

pudl.analysis.state_demand.filter_ferc714_hourly_demand_matrix(df: pandas.core.frame.DataFrame, min_data: int = 100, min_data_fraction: float = 0.9)pandas.core.frame.DataFrame[source]

Filter incomplete years from FERC 714 hourly demand matrix.

Nulls respondent-years with too few data and drops respondents with no data across all years.

Parameters
  • df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().

  • min_data – Minimum number of non-null hours in a year.

  • min_data_fraction – Minimum fraction of non-null hours between the first and last non-null hour in a year.

Returns

Hourly demand matrix df modified in-place.

pudl.analysis.state_demand.impute_ferc714_hourly_demand_matrix(df: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Impute null values in FERC 714 hourly demand matrix.

Imputation is performed separately for each year, with only the respondents reporting data in that year.

Note

Takes about 15 minutes.

Parameters

df – FERC 714 hourly demand matrix, as described in load_ferc714_hourly_demand_matrix().

Returns

Copy of df with imputed values.

pudl.analysis.state_demand.load_counties(pudl_out: pudl.output.pudltabl.PudlTabl, pudl_settings: dict)pandas.core.frame.DataFrame[source]

Load county attributes.

Parameters
  • pudl_out – PUDL database extractor.

  • pudl_settings – PUDL settings.

Returns

Dataframe with columns county_id_fips and population.

pudl.analysis.state_demand.load_eia861_state_total_sales(pudl_out: pudl.output.pudltabl.PudlTabl)pandas.core.frame.DataFrame[source]

Read and format EIA 861 sales by state and year.

Parameters

pudl_out – Used to access pudl.output.pudltabl.PudlTabl.sales_eia861().

Returns

Dataframe with columns state_id_fips, year, demand_mwh.

pudl.analysis.state_demand.load_ferc714_county_assignments(pudl_out: pudl.output.pudltabl.PudlTabl)pandas.core.frame.DataFrame[source]

Load FERC 714 county assignments.

Parameters

pudl_out – PUDL database extractor.

Returns

Dataframe with columns respondent_id_ferc714, report year (int), and county_id_fips.

pudl.analysis.state_demand.load_ferc714_hourly_demand_matrix(pudl_out: pudl.output.pudltabl.PudlTabl)Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Read and format FERC 714 hourly demand into matrix form.

Parameters

pudl_out – Used to access pudl.output.pudltabl.PudlTabl.demand_hourly_pa_ferc714().

Returns

Hourly demand as a matrix with a datetime row index (e.g. ‘2006-01-01 00:00:00’, …, ‘2019-12-31 23:00:00’) in local time ignoring daylight-savings, and a respondent_id_ferc714 column index (e.g. 101, …, 329). A second Dataframe lists the UTC offset in hours of each respondent_id_ferc714 and reporting year (int).

pudl.analysis.state_demand.load_ventyx_hourly_state_demand(path: str)pandas.core.frame.DataFrame[source]

Read and format Ventyx hourly state-level demand.

After manual corrections of the listed time zone, ambiguous time zone issues remain. Below is a list of transmission zones (by Transmission Zone ID) with one or more missing timestamps at transitions to or from daylight-savings:

  • 615253 (Indiana)

  • 615261 (Michigan)

  • 615352 (Wisconsin)

  • 615357 (Missouri)

  • 615377 (Saskatchewan)

  • 615401 (Minnesota, Wisconsin)

  • 615516 (Missouri)

  • 615529 (Oklahoma)

  • 615603 (Idaho, Washington)

  • 1836089 (California)

Parameters

path – Path to the data file (published as ‘state_level_load_2007_2018.csv’).

Returns

Dataframe with hourly state-level demand. * state_id_fips: FIPS code of US state. * utc_datetime: UTC time of the start of each hour. * demand_mwh: Hourly demand in MWh.

pudl.analysis.state_demand.local_to_utc(local: pandas.core.series.Series, tz: Iterable, **kwargs: Any)pandas.core.series.Series[source]

Convert local times to UTC.

Parameters
  • local – Local times (tz-naive datetime64[ns]).

  • tz – For each time, a timezone (see DatetimeIndex.tz_localize()) or UTC offset in hours (int or float).

  • kwargs – Optional arguments to DatetimeIndex.tz_localize().

Returns

UTC times (tz-naive datetime64[ns]).

Examples

>>> s = pd.Series([pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1)])
>>> local_to_utc(s, [-7, -6])
0   2020-01-01 07:00:00
1   2020-01-01 06:00:00
dtype: datetime64[ns]
>>> local_to_utc(s, ['America/Denver', 'America/Chicago'])
0   2020-01-01 07:00:00
1   2020-01-01 06:00:00
dtype: datetime64[ns]
pudl.analysis.state_demand.lookup_state(state: Union[str, int])dict[source]

Lookup US state by state identifier.

Parameters

state – State name, two-letter abbreviation, or FIPS code. String matching is case-insensitive.

Returns

State identifers.

Examples

>>> lookup_state('alabama')
{'name': 'Alabama', 'code': 'AL', 'fips': '01'}
>>> lookup_state('AL')
{'name': 'Alabama', 'code': 'AL', 'fips': '01'}
>>> lookup_state(1)
{'name': 'Alabama', 'code': 'AL', 'fips': '01'}
pudl.analysis.state_demand.main()[source]

Predict state demand.

pudl.analysis.state_demand.melt_ferc714_hourly_demand_matrix(df: pandas.core.frame.DataFrame, tz: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Melt FERC 714 hourly demand matrix to long format.

Parameters
Returns

Long-format hourly demand with columns respondent_id_ferc714, report year (int), utc_datetime, and demand_mwh.

pudl.analysis.state_demand.plot_demand_scatter(a: pandas.core.frame.DataFrame, b: pandas.core.frame.DataFrame, title: Optional[str] = None, path: Optional[str] = None)None[source]

Make a scatter plot comparing predicted and reference demand.

Parameters
  • a – Predicted demand with columns utc_datetime and any of demand_mwh (in grey) and scaled_demand_mwh (in orange).

  • b – Reference demand with columns utc_datetime and demand_mwh. Every element in utc_datetime must match the one in a.

  • title – Plot title.

  • path – Plot path. If provided, the figure is saved to file and closed.

Raises

ValueError – Datetime columns do not match.

pudl.analysis.state_demand.plot_demand_timeseries(a: pandas.core.frame.DataFrame, b: Optional[pandas.core.frame.DataFrame] = None, window: int = 168, title: Optional[str] = None, path: Optional[str] = None)None[source]

Make a timeseries plot of predicted and reference demand.

Parameters
  • a – Predicted demand with columns utc_datetime and any of demand_mwh (in grey) and scaled_demand_mwh (in orange).

  • b – Reference demand with columns utc_datetime and demand_mwh (in red).

  • window – Width of window (in rows) to use to compute rolling means, or None to plot raw values.

  • title – Plot title.

  • path – Plot path. If provided, the figure is saved to file and closed.

pudl.analysis.state_demand.predict_state_hourly_demand(demand: pandas.core.frame.DataFrame, counties: pandas.core.frame.DataFrame, assignments: pandas.core.frame.DataFrame, state_totals: Optional[pandas.core.frame.DataFrame] = None, mean_overlaps: bool = False)pandas.core.frame.DataFrame[source]

Predict state hourly demand.

Parameters
  • demand – Hourly demand timeseries, with columns respondent_id_ferc714, report year, utc_datetime, and demand_mwh.

  • counties – Counties, with columns county_id_fips and population.

  • assignments – County assignments for demand respondents, with columns respondent_id_ferc714, year, and county_id_fips.

  • state_totals – Total annual demand by state, with columns state_id_fips, year, and demand_mwh. If provided, the predicted hourly demand is scaled to match these totals.

  • mean_overlaps – Whether to mean the demands predicted for a county in cases when a county is assigned to multiple respondents. By default, demands are summed.

Returns

Dataframe with columns state_id_fips, utc_datetime, demand_mwh, and (if state_totals was provided) scaled_demand_mwh.

pudl.analysis.state_demand.utc_to_local(utc: pandas.core.series.Series, tz: Iterable)pandas.core.series.Series[source]

Convert UTC times to local.

Parameters
  • utc – UTC times (tz-naive datetime64[ns] or datetime64[ns, UTC]).

  • tz – For each time, a timezone (see DatetimeIndex.tz_localize()) or UTC offset in hours (int or float).

Returns

Local times (tz-naive datetime64[ns]).

Examples

>>> s = pd.Series([pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1)])
>>> utc_to_local(s, [-7, -6])
0   2019-12-31 17:00:00
1   2019-12-31 18:00:00
dtype: datetime64[ns]
>>> utc_to_local(s, ['America/Denver', 'America/Chicago'])
0   2019-12-31 17:00:00
1   2019-12-31 18:00:00
dtype: datetime64[ns]