pudl.transform.vcerare¶
Transformations of the Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset.
Wind and solar profiles are extracted separately, but concatenated into a single table in this module, as they have exactly the same structure.
Attributes¶
Functions¶
|
Prep the lat_long_fips table to merge into the capacity factor tables. |
|
Add datetime and hour_of_year columns. |
|
Drop city columns from the capacity factor tables before stacking. |
|
Function to transform each capacity factor table individually to save memory. |
|
Make the capacity factor column a fraction instead of a percentage. |
|
Make sure the state_county values show up in the FIPS table. |
|
Combine capacity factor tables. |
|
Combine the combined capacity factor df with the FIPS df. |
Transform raw Vibrant Clean Energy renewable generation profiles. |
|
|
Check that the final output table is as expected. |
Module Contents¶
- pudl.transform.vcerare._prep_lat_long_fips_df(raw_vcerare__lat_lon_fips: pandas.DataFrame) pandas.DataFrame [source]¶
Prep the lat_long_fips table to merge into the capacity factor tables.
Prep entails making sure the formatting and column names match those in the capacity factor tables, adding 0s to the beginning of FIPS codes with 4 values, and making separate county/subregion and state columns. Instead of pulling state from the county_state column, we use the first two digits of the county FIPS ID to pull in state code values from the census data stored in POLITICAL_SUBDIVISIONS.
The county portion of the county_state column does not map directly to FIPS ID. Some of the county names are actually subregions like cities or lakes. For this reason we’ve named the column county_or_lake_name and it should be considered part of the primary key. There are several instances of multiple subregions that map to a single county_id_fips value.
- pudl.transform.vcerare._add_time_cols(df: pandas.DataFrame, df_name: str) pandas.DataFrame [source]¶
Add datetime and hour_of_year columns.
This function adds a datetime column and a hour_of_year column. The datetime column is important for merging the data with other data, and the hour_of_year 1-8760 column is important for modeling purposes. The report_year column is also helpful for filtering, so we keep all three!
For leap years (2020), December 31st is excluded.
- pudl.transform.vcerare._drop_city_cols(df: pandas.DataFrame, df_name: str) pandas.DataFrame [source]¶
Drop city columns from the capacity factor tables before stacking.
We do this early since the columns can be droped by name here, and we don’t have to search through all of the stacked rows to find matching records.
- pudl.transform.vcerare._stack_cap_fac_df(df: pandas.DataFrame, df_name: str) pandas.DataFrame [source]¶
Function to transform each capacity factor table individually to save memory.
The main transforms are turning county/subregion columns into county/subregion rows and renaming columns to be more human-readable and compatible with the FIPS df that will get merged in.
This function is intended to save memory by being applied to each individual capacity factor table rather than the giant combined one.
- pudl.transform.vcerare._make_cap_fac_frac(df: pandas.DataFrame, df_name: str) pandas.DataFrame [source]¶
Make the capacity factor column a fraction instead of a percentage.
This step happens before the table gets stacked to save memory.
- pudl.transform.vcerare._check_for_valid_counties(df: pandas.DataFrame, clean_fips_df: pandas.DataFrame, df_name: str) pandas.DataFrame [source]¶
Make sure the state_county values show up in the FIPS table.
This step happens before the table gets stacked to save memory.
- pudl.transform.vcerare._combine_all_cap_fac_dfs(cap_fac_dict: dict[str, pandas.DataFrame]) pandas.DataFrame [source]¶
Combine capacity factor tables.
- pudl.transform.vcerare._combine_cap_fac_with_fips_df(cap_fac_df: pandas.DataFrame, fips_df: pandas.DataFrame) pandas.DataFrame [source]¶
Combine the combined capacity factor df with the FIPS df.
- pudl.transform.vcerare.out_vcerare__hourly_available_capacity_factor(raw_vcerare__lat_lon_fips: pandas.DataFrame, raw_vcerare__fixed_solar_pv_lat_upv: pandas.DataFrame, raw_vcerare__offshore_wind_power_140m: pandas.DataFrame, raw_vcerare__onshore_wind_power_100m: pandas.DataFrame) pandas.DataFrame [source]¶
Transform raw Vibrant Clean Energy renewable generation profiles.
Concatenates the solar and wind capacity factors into a single table and turns the columns for each county or subregion into a single county_or_lake_name column.
- pudl.transform.vcerare.check_hourly_available_cap_fac_table(asset_df: pandas.DataFrame)[source]¶
Check that the final output table is as expected.