pudl.analysis.state_demand module¶
Predict state-level electricity demand.
-
pudl.analysis.state_demand.
STANDARD_UTC_OFFSETS
: Dict[str, str] = {'America/Anchorage': -9, 'America/Chicago': -6, 'America/Denver': -7, 'America/Halifax': -4, 'America/Los_Angeles': -8, 'America/New_York': -5, 'Pacific/Honolulu': -10}¶ Hour offset from Coordinated Universal Time (UTC) by time zone.
Time zones are canonical names (e.g. ‘America/Denver’) from tzdata (https://www.iana.org/time-zones) mapped to their standard-time UTC offset.
-
pudl.analysis.state_demand.
STATES
: List[Dict[str, Union[int, str]]] = [{'name': 'Alabama', 'code': 'AL', 'fips': '01'}, {'name': 'Alaska', 'code': 'AK', 'fips': '02'}, {'name': 'Arizona', 'code': 'AZ', 'fips': '04'}, {'name': 'Arkansas', 'code': 'AR', 'fips': '05'}, {'name': 'California', 'code': 'CA', 'fips': '06'}, {'name': 'Colorado', 'code': 'CO', 'fips': '08'}, {'name': 'Connecticut', 'code': 'CT', 'fips': '09'}, {'name': 'Delaware', 'code': 'DE', 'fips': '10'}, {'name': 'District of Columbia', 'code': 'DC', 'fips': '11'}, {'name': 'Florida', 'code': 'FL', 'fips': '12'}, {'name': 'Georgia', 'code': 'GA', 'fips': '13'}, {'name': 'Hawaii', 'code': 'HI', 'fips': '15'}, {'name': 'Idaho', 'code': 'ID', 'fips': '16'}, {'name': 'Illinois', 'code': 'IL', 'fips': '17'}, {'name': 'Indiana', 'code': 'IN', 'fips': '18'}, {'name': 'Iowa', 'code': 'IA', 'fips': '19'}, {'name': 'Kansas', 'code': 'KS', 'fips': '20'}, {'name': 'Kentucky', 'code': 'KY', 'fips': '21'}, {'name': 'Louisiana', 'code': 'LA', 'fips': '22'}, {'name': 'Maine', 'code': 'ME', 'fips': '23'}, {'name': 'Maryland', 'code': 'MD', 'fips': '24'}, {'name': 'Massachusetts', 'code': 'MA', 'fips': '25'}, {'name': 'Michigan', 'code': 'MI', 'fips': '26'}, {'name': 'Minnesota', 'code': 'MN', 'fips': '27'}, {'name': 'Mississippi', 'code': 'MS', 'fips': '28'}, {'name': 'Missouri', 'code': 'MO', 'fips': '29'}, {'name': 'Montana', 'code': 'MT', 'fips': '30'}, {'name': 'Nebraska', 'code': 'NE', 'fips': '31'}, {'name': 'Nevada', 'code': 'NV', 'fips': '32'}, {'name': 'New Hampshire', 'code': 'NH', 'fips': '33'}, {'name': 'New Jersey', 'code': 'NJ', 'fips': '34'}, {'name': 'New Mexico', 'code': 'NM', 'fips': '35'}, {'name': 'New York', 'code': 'NY', 'fips': '36'}, {'name': 'North Carolina', 'code': 'NC', 'fips': '37'}, {'name': 'North Dakota', 'code': 'ND', 'fips': '38'}, {'name': 'Ohio', 'code': 'OH', 'fips': '39'}, {'name': 'Oklahoma', 'code': 'OK', 'fips': '40'}, {'name': 'Oregon', 'code': 'OR', 'fips': '41'}, {'name': 'Pennsylvania', 'code': 'PA', 'fips': '42'}, {'name': 'Rhode Island', 'code': 'RI', 'fips': '44'}, {'name': 'South Carolina', 'code': 'SC', 'fips': '45'}, {'name': 'South Dakota', 'code': 'SD', 'fips': '46'}, {'name': 'Tennessee', 'code': 'TN', 'fips': '47'}, {'name': 'Texas', 'code': 'TX', 'fips': '48'}, {'name': 'Utah', 'code': 'UT', 'fips': '49'}, {'name': 'Vermont', 'code': 'VT', 'fips': '50'}, {'name': 'Virginia', 'code': 'VA', 'fips': '51'}, {'name': 'Washington', 'code': 'WA', 'fips': '53'}, {'name': 'West Virginia', 'code': 'WV', 'fips': '54'}, {'name': 'Wisconsin', 'code': 'WI', 'fips': '55'}, {'name': 'Wyoming', 'code': 'WY', 'fips': '56'}, {'name': 'American Samoa', 'code': 'AS', 'fips': '60'}, {'name': 'Guam', 'code': 'GU', 'fips': '66'}, {'name': 'Northern Mariana Islands', 'code': 'MP', 'fips': '69'}, {'name': 'Puerto Rico', 'code': 'PR', 'fips': '72'}, {'name': 'Virgin Islands', 'code': 'VI', 'fips': '78'}]¶ Attributes of US states and territories.
name (str): Full name.
code (str): US Postal Service (USPS) two-letter alphabetic code.
fips (int): Federal Information Processing Standard (FIPS) code.
-
pudl.analysis.state_demand.
UTC_OFFSETS
: Dict[str, int] = {'ADT': -3, 'AKDT': -8, 'AKST': -9, 'AST': -4, 'CDT': -5, 'CST': -6, 'EDT': -4, 'EST': -5, 'HST': -10, 'MDT': -6, 'MST': -7, 'PDT': -7, 'PST': -8}¶ Hour offset from Coordinated Universal Time (UTC) by time zone.
Time zones are either standard or daylight-savings time zone abbreviations (e.g. ‘MST’).
-
pudl.analysis.state_demand.
clean_ferc714_hourly_demand_matrix
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶ Detect and null anomalous values in FERC 714 hourly demand matrix.
Note
Takes about 10 minutes.
- Parameters
df – FERC 714 hourly demand matrix, as described in
load_ferc714_hourly_demand_matrix()
.- Returns
Copy of df with nulled anomalous values.
-
pudl.analysis.state_demand.
compare_state_demand
(a: pandas.core.frame.DataFrame, b: pandas.core.frame.DataFrame, scaled: bool = True) → pandas.core.frame.DataFrame[source]¶ Compute statistics comparing predicted and reference demand.
Statistics are computed for each year.
- Parameters
a – Predicted demand with columns utc_datetime and either demand_mwh (if scaled=False) or `scaled_demand_mwh (if scaled=True).
b – Reference demand with columns utc_datetime and demand_mwh. Every element in utc_datetime must match the one in a.
- Returns
Dataframe with columns year, rmse (root mean square error), and mae (mean absolute error).
- Raises
ValueError – Datetime columns do not match.
-
pudl.analysis.state_demand.
filter_ferc714_hourly_demand_matrix
(df: pandas.core.frame.DataFrame, min_data: int = 100, min_data_fraction: float = 0.9) → pandas.core.frame.DataFrame[source]¶ Filter incomplete years from FERC 714 hourly demand matrix.
Nulls respondent-years with too few data and drops respondents with no data across all years.
- Parameters
df – FERC 714 hourly demand matrix, as described in
load_ferc714_hourly_demand_matrix()
.min_data – Minimum number of non-null hours in a year.
min_data_fraction – Minimum fraction of non-null hours between the first and last non-null hour in a year.
- Returns
Hourly demand matrix df modified in-place.
-
pudl.analysis.state_demand.
impute_ferc714_hourly_demand_matrix
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶ Impute null values in FERC 714 hourly demand matrix.
Imputation is performed separately for each year, with only the respondents reporting data in that year.
Note
Takes about 15 minutes.
- Parameters
df – FERC 714 hourly demand matrix, as described in
load_ferc714_hourly_demand_matrix()
.- Returns
Copy of df with imputed values.
-
pudl.analysis.state_demand.
load_counties
(pudl_out: pudl.output.pudltabl.PudlTabl, pudl_settings: dict) → pandas.core.frame.DataFrame[source]¶ Load county attributes.
- Parameters
pudl_out – PUDL database extractor.
pudl_settings – PUDL settings.
- Returns
Dataframe with columns county_id_fips and population.
-
pudl.analysis.state_demand.
load_eia861_state_total_sales
(pudl_out: pudl.output.pudltabl.PudlTabl) → pandas.core.frame.DataFrame[source]¶ Read and format EIA 861 sales by state and year.
- Parameters
pudl_out – Used to access
pudl.output.pudltabl.PudlTabl.sales_eia861()
.- Returns
Dataframe with columns state_id_fips, year, demand_mwh.
-
pudl.analysis.state_demand.
load_ferc714_county_assignments
(pudl_out: pudl.output.pudltabl.PudlTabl) → pandas.core.frame.DataFrame[source]¶ Load FERC 714 county assignments.
- Parameters
pudl_out – PUDL database extractor.
- Returns
Dataframe with columns respondent_id_ferc714, report year (int), and county_id_fips.
-
pudl.analysis.state_demand.
load_ferc714_hourly_demand_matrix
(pudl_out: pudl.output.pudltabl.PudlTabl) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶ Read and format FERC 714 hourly demand into matrix form.
- Parameters
pudl_out – Used to access
pudl.output.pudltabl.PudlTabl.demand_hourly_pa_ferc714()
.- Returns
Hourly demand as a matrix with a datetime row index (e.g. ‘2006-01-01 00:00:00’, …, ‘2019-12-31 23:00:00’) in local time ignoring daylight-savings, and a respondent_id_ferc714 column index (e.g. 101, …, 329). A second Dataframe lists the UTC offset in hours of each respondent_id_ferc714 and reporting year (int).
-
pudl.analysis.state_demand.
load_ventyx_hourly_state_demand
(path: str) → pandas.core.frame.DataFrame[source]¶ Read and format Ventyx hourly state-level demand.
After manual corrections of the listed time zone, ambiguous time zone issues remain. Below is a list of transmission zones (by Transmission Zone ID) with one or more missing timestamps at transitions to or from daylight-savings:
615253 (Indiana)
615261 (Michigan)
615352 (Wisconsin)
615357 (Missouri)
615377 (Saskatchewan)
615401 (Minnesota, Wisconsin)
615516 (Missouri)
615529 (Oklahoma)
615603 (Idaho, Washington)
1836089 (California)
- Parameters
path – Path to the data file (published as ‘state_level_load_2007_2018.csv’).
- Returns
Dataframe with hourly state-level demand. * state_id_fips: FIPS code of US state. * utc_datetime: UTC time of the start of each hour. * demand_mwh: Hourly demand in MWh.
-
pudl.analysis.state_demand.
local_to_utc
(local: pandas.core.series.Series, tz: Iterable, **kwargs: Any) → pandas.core.series.Series[source]¶ Convert local times to UTC.
- Parameters
local – Local times (tz-naive datetime64[ns]).
tz – For each time, a timezone (see
DatetimeIndex.tz_localize()
) or UTC offset in hours (int or float).kwargs – Optional arguments to
DatetimeIndex.tz_localize()
.
- Returns
UTC times (tz-naive datetime64[ns]).
Examples
>>> s = pd.Series([pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1)]) >>> local_to_utc(s, [-7, -6]) 0 2020-01-01 07:00:00 1 2020-01-01 06:00:00 dtype: datetime64[ns] >>> local_to_utc(s, ['America/Denver', 'America/Chicago']) 0 2020-01-01 07:00:00 1 2020-01-01 06:00:00 dtype: datetime64[ns]
-
pudl.analysis.state_demand.
lookup_state
(state: Union[str, int]) → dict[source]¶ Lookup US state by state identifier.
- Parameters
state – State name, two-letter abbreviation, or FIPS code. String matching is case-insensitive.
- Returns
State identifers.
Examples
>>> lookup_state('alabama') {'name': 'Alabama', 'code': 'AL', 'fips': '01'} >>> lookup_state('AL') {'name': 'Alabama', 'code': 'AL', 'fips': '01'} >>> lookup_state(1) {'name': 'Alabama', 'code': 'AL', 'fips': '01'}
-
pudl.analysis.state_demand.
melt_ferc714_hourly_demand_matrix
(df: pandas.core.frame.DataFrame, tz: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶ Melt FERC 714 hourly demand matrix to long format.
- Parameters
df – FERC 714 hourly demand matrix, as described in
load_ferc714_hourly_demand_matrix()
.tz – FERC 714 respondent time zones, as described in
load_ferc714_hourly_demand_matrix()
.
- Returns
Long-format hourly demand with columns respondent_id_ferc714, report year (int), utc_datetime, and demand_mwh.
-
pudl.analysis.state_demand.
plot_demand_scatter
(a: pandas.core.frame.DataFrame, b: pandas.core.frame.DataFrame, title: Optional[str] = None, path: Optional[str] = None) → None[source]¶ Make a scatter plot comparing predicted and reference demand.
- Parameters
a – Predicted demand with columns utc_datetime and any of demand_mwh (in grey) and scaled_demand_mwh (in orange).
b – Reference demand with columns utc_datetime and demand_mwh. Every element in utc_datetime must match the one in a.
title – Plot title.
path – Plot path. If provided, the figure is saved to file and closed.
- Raises
ValueError – Datetime columns do not match.
-
pudl.analysis.state_demand.
plot_demand_timeseries
(a: pandas.core.frame.DataFrame, b: Optional[pandas.core.frame.DataFrame] = None, window: int = 168, title: Optional[str] = None, path: Optional[str] = None) → None[source]¶ Make a timeseries plot of predicted and reference demand.
- Parameters
a – Predicted demand with columns utc_datetime and any of demand_mwh (in grey) and scaled_demand_mwh (in orange).
b – Reference demand with columns utc_datetime and demand_mwh (in red).
window – Width of window (in rows) to use to compute rolling means, or None to plot raw values.
title – Plot title.
path – Plot path. If provided, the figure is saved to file and closed.
-
pudl.analysis.state_demand.
predict_state_hourly_demand
(demand: pandas.core.frame.DataFrame, counties: pandas.core.frame.DataFrame, assignments: pandas.core.frame.DataFrame, state_totals: Optional[pandas.core.frame.DataFrame] = None, mean_overlaps: bool = False) → pandas.core.frame.DataFrame[source]¶ Predict state hourly demand.
- Parameters
demand – Hourly demand timeseries, with columns respondent_id_ferc714, report year, utc_datetime, and demand_mwh.
counties – Counties, with columns county_id_fips and population.
assignments – County assignments for demand respondents, with columns respondent_id_ferc714, year, and county_id_fips.
state_totals – Total annual demand by state, with columns state_id_fips, year, and demand_mwh. If provided, the predicted hourly demand is scaled to match these totals.
mean_overlaps – Whether to mean the demands predicted for a county in cases when a county is assigned to multiple respondents. By default, demands are summed.
- Returns
Dataframe with columns state_id_fips, utc_datetime, demand_mwh, and (if state_totals was provided) scaled_demand_mwh.
-
pudl.analysis.state_demand.
utc_to_local
(utc: pandas.core.series.Series, tz: Iterable) → pandas.core.series.Series[source]¶ Convert UTC times to local.
- Parameters
utc – UTC times (tz-naive datetime64[ns] or datetime64[ns, UTC]).
tz – For each time, a timezone (see
DatetimeIndex.tz_localize()
) or UTC offset in hours (int or float).
- Returns
Local times (tz-naive datetime64[ns]).
Examples
>>> s = pd.Series([pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1)]) >>> utc_to_local(s, [-7, -6]) 0 2019-12-31 17:00:00 1 2019-12-31 18:00:00 dtype: datetime64[ns] >>> utc_to_local(s, ['America/Denver', 'America/Chicago']) 0 2019-12-31 17:00:00 1 2019-12-31 18:00:00 dtype: datetime64[ns]