pudl.transform.gridpathratoolkit

Transformations of the GridPath RA Toolkit renewable generation profiles.

Wind and solar profiles are extracted separately, but concatenated into a single table in this module, as they have exactly the same structure. The generator aggregation group association tables for various technology types are also concatenated together.

Module Contents

Functions

_transform_capacity_factors(→ pandas.DataFrame)

Basic transformations that can be applied to many profiles.

out_gridpathratoolkit__hourly_available_capacity_factor(...)

Transform raw GridPath RA Toolkit renewable generation profiles.

_transform_aggs(→ pandas.DataFrame)

Transform raw GridPath RA Toolkit generator aggregations.

core_gridpathratoolkit__assn_generator_aggregation_group(...)

Transform and combine raw GridPath RA Toolkit generator aggregations.

check_valid_aggregation_groups(→ dagster.AssetCheckResult)

Check that every capacity factor aggregation key appears in the aggregations.

pudl.transform.gridpathratoolkit._transform_capacity_factors(capacity_factors: pandas.DataFrame, utc_offset: pandas.Timedelta) pandas.DataFrame[source]

Basic transformations that can be applied to many profiles.

  • Construct a datetime column and adjust it to be in UTC.

  • Reshape the table from wide to tidy format.

  • Name columns appropriately.

pudl.transform.gridpathratoolkit.out_gridpathratoolkit__hourly_available_capacity_factor(raw_gridpathratoolkit__aggregated_extended_solar_capacity: pandas.DataFrame, raw_gridpathratoolkit__aggregated_extended_wind_capacity: pandas.DataFrame) pandas.DataFrame[source]

Transform raw GridPath RA Toolkit renewable generation profiles.

Concatenates the solar and wind capacity factors into a single table and turns the aggregation key into a categorical column to save space.

Note that this transform is a bit unusual, in that it is producing a highly processed output table. That’s because we’re working backwards from an archived finished product to be able to provide a minimum viable product. Our intent is to integrate or reimplement the steps required to produce this output table from less processed original inputs in the future.

pudl.transform.gridpathratoolkit._transform_aggs(raw_agg: pandas.DataFrame) pandas.DataFrame[source]

Transform raw GridPath RA Toolkit generator aggregations.

  • split EIA_UniqueID into plant + generator IDs

  • rename columns to use PUDL conventions

  • verify that split-out plant IDs always match reported plant IDs

  • Set column dtypes

pudl.transform.gridpathratoolkit.core_gridpathratoolkit__assn_generator_aggregation_group(raw_gridpathratoolkit__wind_capacity_aggregations: pandas.DataFrame, raw_gridpathratoolkit__solar_capacity_aggregations: pandas.DataFrame) pandas.DataFrame[source]

Transform and combine raw GridPath RA Toolkit generator aggregations.

pudl.transform.gridpathratoolkit.check_valid_aggregation_groups(out_gridpathratoolkit__hourly_available_capacity_factor, aggs: pandas.DataFrame) dagster.AssetCheckResult[source]

Check that every capacity factor aggregation key appears in the aggregations.

This isn’t a normal foreign-key relationship, since the aggregation group isn’t the primary key in the aggregation tables, and is not unique in either of these tables, but if an aggregation group appears in the capacity factor time series and never appears in the aggregation table, then something is wrong.