pudl.transform.nrelatb

Transform NREL ATB data into well normalized, cleaned tables.

Module Contents

Classes

TableNormalizer

Info needed to convert a selection of the raw NREL table into a normalized table.

Normalizer

Class that defines how to normalize all of the NREL tables that get the transform_normalize() treatment.

TableUnstacker

Info needed to unstack a portion of the NREL ATB table.

Unstacker

Class that defines how to unstack the raw ATB table into all of the tidy core tables.

Functions

transform_normalize(nrelatb, normalizer)

Normalize a subset of the NREL ATB data into a small table.

transform_unstack(→ pandas.DataFrame)

Generic unstacking function to convert ATB data from a skinny to wider format.

_core_nrelatb__transform_start(raw_nrelatb__data)

Transform raw NREL ATB data into semi-clean but still very skinny table.

core_nrelatb__yearly_projected_financial_cases(...)

Transform the data defining the assumptions for the ATB financial cases.

core_nrelatb__yearly_projected_financial_cases_by_scenario(...)

Transform the data defining the assumptions for the ATB financial cases which vary by scenario.

broadcast_fixed_charge_rate_across_tech_detail(...)

For older years, broadcast the fixed_charge_rate parameter across the technical detail columns.

broadcast_asterisk_cost_recovery_period_years(...)

Broadcast the asterisk (wildcard) cost_recovery_period_years.

_broadcast_core_metric_parameters(→ pandas.DataFrame)

Broadcast a section of a table and fillna with the broadcasted values.

core_nrelatb__yearly_projected_cost_performance(...)

Transform the yearly NREL ATB cost and performance projections.

_core_nrelatb__yearly_units(→ pandas.DataFrame)

Transform a table of units by core_metric_parameter.

core_nrelatb__yearly_technology_status(→ pandas.DataFrame)

Transform a small table of statuses of different technology types.

Attributes

IDX_ALL

Expected primary key columns for the raw nrelatb data.

pudl.transform.nrelatb.IDX_ALL = ['report_year', 'model_case_nrelatb', 'projection_year', 'cost_recovery_period_years',...[source]

Expected primary key columns for the raw nrelatb data.

The normalized core tables we are trying to build will have primary keys which

are a subset of these columns.

class pudl.transform.nrelatb.TableNormalizer(/, **data: Any)[source]

Bases: pydantic.BaseModel

Info needed to convert a selection of the raw NREL table into a normalized table.

idx: list[str][source]

Primary key columns of normalized subset table.

columns: list[str][source]

Columns reported in raw NREL table that are unique to the idx.

pudl.transform.nrelatb.transform_normalize(nrelatb: pandas.DataFrame, normalizer: TableNormalizer)[source]

Normalize a subset of the NREL ATB data into a small table.

Given a TableNormalizer with a set of primary keys (idx) and a list of columns (columns), build a table with just those primary key and data columns from the larger ATB semi-processed table (output of _core_nrelatb__transform_start()). Ensure that the output table is unique based on the primary keys.

class pudl.transform.nrelatb.Normalizer(/, **data: Any)[source]

Bases: pydantic.BaseModel

Class that defines how to normalize all of the NREL tables that get the transform_normalize() treatment.

There are several columns in the raw ATB data that are not a part of the primary keys for the tables that get the transform_unstack() treatment. This class helps us build these smaller tables which have a smaller subset of primary key columns.

units: TableNormalizer[source]
technology_status: TableNormalizer[source]
class pudl.transform.nrelatb.TableUnstacker(/, **data: Any)[source]

Bases: pydantic.BaseModel

Info needed to unstack a portion of the NREL ATB table.

This class defines a portion of the raw ATB table to get the transform_unstack() treatment. The set of tables which get this treatment are defined in Unstacker.

property idx_unstacked: list[str][source]

Primary key columns after the table is unstacked.

All of the columns in idx except core_metric_parameter.

idx: list[str][source]
core_metric_parameters: list[str][source]

Values from the core_metric_parameter column to be included in this unstack.

classmethod idx_are_same_or_subset_of_idx_all(idx: list[str])[source]

Are the idx columns either the same as or a subset of IDX_ALL?

pudl.transform.nrelatb.transform_unstack(nrelatb: pandas.DataFrame, table_unstacker: TableUnstacker) pandas.DataFrame[source]

Generic unstacking function to convert ATB data from a skinny to wider format.

This function applies pandas.unstack() to a subset of values for core_metric_parameter (via TableUnstacker.core_metric_parameters) with different primary keys (via TableUnstacker.idx). If the set of given core_metric_parameters result in non-unique values for the primary keys, pandas.unstack() will raise an error.

class pudl.transform.nrelatb.Unstacker(/, **data: Any)[source]

Bases: pydantic.BaseModel

Class that defines how to unstack the raw ATB table into all of the tidy core tables.

The ATB data is reported in a very skinny format that enables the raw data to have the same schema over time. The core_metric_parameter column contains a string which indicates what type of data is being reported in the value column.

We want the strings in core_metric_parameter to end up as column names in the tables, so that each column represents a unique type of data. In the end, there will be one column containing values from the value column for each unique core_metric_parameter. A quirk with ATB is that different core_metric_parameter have different set of primary keys. Subsets of the core_metric_parameter have unique values across the data given specific primary keys.

The convention for ATB data is to use an asterisk in the key columns as a wildcard. Generally when an asterisk is in one the IDX_ALL columns, the corresponding core_metric_parameter should be associated with a table without that column as one of its idx - thus in effect dropping these asterisks from the data. Once these tables are in their core tidy format, they can be merged back together using the primary keys.

This class defines all of the tables in the ATB data that get the transform_unstack() treatment.

property core_metric_parameters_all: list[str][source]

Compilation of all of the parameter values from each of the tables.

Also check if there are no duplicate core_metric_parameters. We expect all of the parameters across the TableUnstacker to be unique.

rate_table: TableUnstacker[source]
scenario_table: TableUnstacker[source]
tech_detail_table: TableUnstacker[source]
pudl.transform.nrelatb._core_nrelatb__transform_start(raw_nrelatb__data)[source]

Transform raw NREL ATB data into semi-clean but still very skinny table.

pudl.transform.nrelatb.core_nrelatb__yearly_projected_financial_cases(_core_nrelatb__transform_start) pandas.DataFrame[source]

Transform the data defining the assumptions for the ATB financial cases.

Right now, this just unstacks the table.

pudl.transform.nrelatb.core_nrelatb__yearly_projected_financial_cases_by_scenario(_core_nrelatb__transform_start) pandas.DataFrame[source]

Transform the data defining the assumptions for the ATB financial cases which vary by scenario.

Right now, this unstacks the table and applies broadcast_fixed_charge_rate_across_tech_detail().

pudl.transform.nrelatb.broadcast_fixed_charge_rate_across_tech_detail(nrelatb_unstacked: pandas.DataFrame, idx_broadcast: list[str]) pandas.DataFrame[source]

For older years, broadcast the fixed_charge_rate parameter across the technical detail columns.

We want the table schema to be consistent for all years of ATB data. Mostly the parameters have the same primary keys across all of the years. But the fixed_charge_rate parameter is the only exception. For the older years (pre-2023), the FCR parameter is not variable based on tech detail so we are going to broadcast the pre-2023 fixed_charge_rate values across the tech details that exist in the data.

pudl.transform.nrelatb.broadcast_asterisk_cost_recovery_period_years(nrelatb_unstacked, unstack_scenario: TableUnstacker)[source]

Broadcast the asterisk (wildcard) cost_recovery_period_years.

Most of the records in this unstacked table have values in the cost_recovery_period_years column, but before broadcasting, about 15% of the table has an * in this column. This is a part of the tables primary key and because we know * in a primary key effectively means wildcard, we want to broadcast the records with an asterisk across the rest of the data. Unfortunately, there are still ~5% of records w/ * that don’t have associated records in the rest of the data (the are left_only records in _broadcast_core_metric_parameters()) so they end up with nulls in the cost_recovery_period_years column.

Probably we could treat cost_recovery_period_years as a categorical column and/or figure out ways to fill in these nulls with the right set of merge keys.

pudl.transform.nrelatb._broadcast_core_metric_parameters(nrelatb_unstacked: pandas.DataFrame, mask_broadcast: pandas.Series, core_metric_parameters: list[str], idx_broadcast: list[str]) pandas.DataFrame[source]

Broadcast a section of a table and fillna with the broadcasted values.

Parameters:
  • nrelatb_unstacked – the unstacked ATB table which has values to broadcast.

  • mask_broadcast – a series with the same index as nrelatb_unstacked and boolean values where the True’s are the records that you want to broadcast.

  • core_metric_parameters – the list of core_metric_parameter columns in nrelatb_unstacked which you want to extract values from the broadcasted records.

  • idx_broadcast – the columns to merge on.

pudl.transform.nrelatb.core_nrelatb__yearly_projected_cost_performance(_core_nrelatb__transform_start) pandas.DataFrame[source]

Transform the yearly NREL ATB cost and performance projections.

Right now, this just unstacks the table.

pudl.transform.nrelatb._core_nrelatb__yearly_units(_core_nrelatb__transform_start: pandas.DataFrame) pandas.DataFrame[source]

Transform a table of units by core_metric_parameter.

This asset is created mostly to ensure that the input units do not vary within one core_metric_parameter. If they do vary, we will need to standardize the units of that parameter.

pudl.transform.nrelatb.core_nrelatb__yearly_technology_status(_core_nrelatb__transform_start: pandas.DataFrame) pandas.DataFrame[source]

Transform a small table of statuses of different technology types.