pudl.transform.vcerare ====================== .. py:module:: pudl.transform.vcerare .. autoapi-nested-parse:: Transformations of the Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset. Wind and solar profiles are extracted separately, but concatenated into a single table in this module, as they have exactly the same structure. Attributes ---------- .. autoapisummary:: pudl.transform.vcerare.logger Functions --------- .. autoapisummary:: pudl.transform.vcerare._prep_lat_long_fips_df pudl.transform.vcerare._add_time_cols pudl.transform.vcerare._drop_city_cols pudl.transform.vcerare._stack_cap_fac_df pudl.transform.vcerare._make_cap_fac_frac pudl.transform.vcerare._check_for_valid_counties pudl.transform.vcerare._combine_all_cap_fac_dfs pudl.transform.vcerare._combine_cap_fac_with_fips_df pudl.transform.vcerare._get_parquet_path pudl.transform.vcerare.one_year_hourly_available_capacity_factor pudl.transform.vcerare.out_vcerare__hourly_available_capacity_factor pudl.transform.vcerare._load_duckdb_table pudl.transform.vcerare.check_rows pudl.transform.vcerare.check_nulls pudl.transform.vcerare.check_pv_capacity_factor_upper_bound pudl.transform.vcerare.check_wind_capacity_factor_upper_bound pudl.transform.vcerare.check_capacity_factor_lower_bound pudl.transform.vcerare.check_max_hour_of_year pudl.transform.vcerare.check_unexpected_dates pudl.transform.vcerare.check_hour_from_date pudl.transform.vcerare.check_unexpected_counties pudl.transform.vcerare.check_duplicate_county_id_fips Module Contents --------------- .. py:data:: logger .. py:function:: _prep_lat_long_fips_df(raw_vcerare__lat_lon_fips: pandas.DataFrame) -> pandas.DataFrame Prep the lat_long_fips table to merge into the capacity factor tables. Prep entails making sure the formatting and column names match those in the capacity factor tables, adding 0s to the beginning of FIPS codes with 4 values, and making separate county/subregion and state columns. Instead of pulling state from the county_state column, we use the first two digits of the county FIPS ID to pull in state code values from the census data stored in POLITICAL_SUBDIVISIONS. The county portion of the county_state column does not map directly to FIPS ID. Some of the county names are actually subregions like cities or lakes. For this reason we've named the column county_or_lake_name and it should be considered part of the primary key. There are several instances of multiple subregions that map to a single county_id_fips value. .. py:function:: _add_time_cols(df: pandas.DataFrame, df_name: str) -> pandas.DataFrame Add datetime and hour_of_year columns. This function adds a datetime column and a hour_of_year column. The datetime column is important for merging the data with other data, and the hour_of_year 1-8760 column is important for modeling purposes. The report_year column is also helpful for filtering, so we keep all three! For leap years (2020), December 31st is excluded. .. py:function:: _drop_city_cols(df: pandas.DataFrame, df_name: str) -> pandas.DataFrame Drop city columns from the capacity factor tables before stacking. We do this early since the columns can be droped by name here, and we don't have to search through all of the stacked rows to find matching records. .. py:function:: _stack_cap_fac_df(df: pandas.DataFrame, df_name: str) -> pandas.DataFrame Function to transform each capacity factor table individually to save memory. The main transforms are turning county/subregion columns into county/subregion rows and renaming columns to be more human-readable and compatible with the FIPS df that will get merged in. This function is intended to save memory by being applied to each individual capacity factor table rather than the giant combined one. .. py:function:: _make_cap_fac_frac(df: pandas.DataFrame, df_name: str) -> pandas.DataFrame Make the capacity factor column a fraction instead of a percentage. This step happens before the table gets stacked to save memory. .. py:function:: _check_for_valid_counties(df: pandas.DataFrame, clean_fips_df: pandas.DataFrame, df_name: str) -> pandas.DataFrame Make sure the state_county values show up in the FIPS table. This step happens before the table gets stacked to save memory. .. py:function:: _combine_all_cap_fac_dfs(cap_fac_dict: dict[str, pandas.DataFrame]) -> pandas.DataFrame Combine capacity factor tables. .. py:function:: _combine_cap_fac_with_fips_df(cap_fac_df: pandas.DataFrame, fips_df: pandas.DataFrame) -> pandas.DataFrame Combine the combined capacity factor df with the FIPS df. .. py:function:: _get_parquet_path() .. py:function:: one_year_hourly_available_capacity_factor(year: int, raw_vcerare__lat_lon_fips: pandas.DataFrame, raw_vcerare__fixed_solar_pv_lat_upv: pandas.DataFrame, raw_vcerare__offshore_wind_power_140m: pandas.DataFrame, raw_vcerare__onshore_wind_power_100m: pandas.DataFrame) -> pandas.DataFrame Transform raw Vibrant Clean Energy renewable generation profiles. Concatenates the solar and wind capacity factors into a single table and turns the columns for each county or subregion into a single county_or_lake_name column. .. py:function:: out_vcerare__hourly_available_capacity_factor(raw_vcerare__lat_lon_fips: pandas.DataFrame, raw_vcerare__fixed_solar_pv_lat_upv: pandas.DataFrame, raw_vcerare__offshore_wind_power_140m: pandas.DataFrame, raw_vcerare__onshore_wind_power_100m: pandas.DataFrame) Transform raw Vibrant Clean Energy renewable generation profiles. Concatenates the solar and wind capacity factors into a single table and turns the columns for each county or subregion into a single county_or_lake_name column. Asset will process 1 year of data at a time to limit peak memory usage. .. py:function:: _load_duckdb_table() Load VCE RARE output table to duckdb for running asset checks. .. py:function:: check_rows() -> dagster.AssetCheckResult Check rows. .. py:function:: check_nulls() -> dagster.AssetCheckResult Check nulls. .. py:function:: check_pv_capacity_factor_upper_bound() -> dagster.AssetCheckResult Check pv capacity upper bound. .. py:function:: check_wind_capacity_factor_upper_bound() -> dagster.AssetCheckResult Check wind capacity upper bound. .. py:function:: check_capacity_factor_lower_bound() -> dagster.AssetCheckResult Check capacity lower bound. .. py:function:: check_max_hour_of_year() -> dagster.AssetCheckResult Check max hour of year. .. py:function:: check_unexpected_dates() -> dagster.AssetCheckResult Check unexpected dates. .. py:function:: check_hour_from_date() -> dagster.AssetCheckResult Check hour from date. .. py:function:: check_unexpected_counties() -> dagster.AssetCheckResult Check unexpected counties. .. py:function:: check_duplicate_county_id_fips() -> dagster.AssetCheckResult Check duplicate county ID.