pudl.settings#

Module for validating pudl etl settings.

Module Contents#

Classes#

XbrlFormNumber

Contains full list of supported FERC XBRL forms.

BaseModel

BaseModel with global configuration.

GenericDatasetSettings

An abstract pydantic model for generic datasets.

Ferc1Settings

An immutable pydantic model to validate Ferc1Settings.

Ferc714Settings

An immutable pydantic model to validate Ferc714Settings.

EpaCemsSettings

An immutable pydantic model to validate EPA CEMS settings.

Eia923Settings

An immutable pydantic model to validate EIA 923 settings.

Eia861Settings

An immutable pydantic model to validate EIA 861 settings.

Eia860Settings

An immutable pydantic model to validate EIA 860 settings.

GlueSettings

An immutable pydantic model to validate Glue settings.

EiaSettings

An immutable pydantic model to validate EIA datasets settings.

DatasetsSettings

An immutable pydantic model to validate PUDL Dataset settings.

Ferc1DbfToSqliteSettings

An immutable Pydantic model to validate FERC 1 to SQLite settings.

FercGenericXbrlToSqliteSettings

An immutable pydantic model to validate Ferc1 to SQLite settings.

Ferc1XbrlToSqliteSettings

An immutable pydantic model to validate Ferc1 to SQLite settings.

Ferc2XbrlToSqliteSettings

An immutable pydantic model to validate FERC from 2 XBRL to SQLite settings.

Ferc6XbrlToSqliteSettings

An immutable pydantic model to validate FERC from 6 XBRL to SQLite settings.

Ferc60XbrlToSqliteSettings

An immutable pydantic model to validate FERC from 60 XBRL to SQLite settings.

Ferc714XbrlToSqliteSettings

An immutable pydantic model to validate FERC from 714 XBRL to SQLite settings.

FercToSqliteSettings

An immutable pydantic model to validate FERC XBRL to SQLite settings.

EtlSettings

Main settings validation class.

Functions#

_make_doi_clickable(link)

Make a clickable DOI.

class pudl.settings.XbrlFormNumber[source]#

Bases: enum.Enum

Contains full list of supported FERC XBRL forms.

FORM1 = 1[source]#
FORM2 = 2[source]#
FORM6 = 6[source]#
FORM60 = 60[source]#
FORM714 = 714[source]#
class pudl.settings.BaseModel[source]#

Bases: pydantic.BaseModel

BaseModel with global configuration.

class Config[source]#

Pydantic config.

allow_mutation = False[source]#
extra = forbid[source]#
class pudl.settings.GenericDatasetSettings[source]#

Bases: BaseModel

An abstract pydantic model for generic datasets.

Each dataset must specify working tables and partitions. A dataset can have an arbitrary number of partitions.

property partitions: list[None | dict[str, str]][source]#

Return list of dictionaries representing individual partitions.

Convert a list of partitions into a list of dictionaries of partitions. This is intended to be used to store partitions in a format that is easy to use with pd.json_normalize.

tables :list[str][source]#
validate_partitions(partitions)[source]#

Validate the requested data partitions.

Check that all the partitions defined in the working_partitions of the associated data_source (e.g. years or states) have been assigned in the definition of the class, and that the requested values are a subset of the allowable values defined by the data_source.

validate_tables(tables)[source]#

Validate tables are available.

class pudl.settings.Ferc1Settings[source]#

Bases: GenericDatasetSettings

An immutable pydantic model to validate Ferc1Settings.

Parameters:
  • data_source – DataSource metadata object

  • years – list of years to validate.

  • tables – list of tables to validate.

property dbf_years[source]#

Return validated years for which DBF data is available.

property xbrl_years[source]#

Return validated years for which DBF data is available.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
tables :list[str][source]#
validate_tables(tables)[source]#

Validate tables are available.

class pudl.settings.Ferc714Settings[source]#

Bases: GenericDatasetSettings

An immutable pydantic model to validate Ferc714Settings.

Parameters:
  • data_source – DataSource metadata object

  • tables – list of tables to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
tables :list[str][source]#
years :list[int][source]#
class pudl.settings.EpaCemsSettings[source]#

Bases: GenericDatasetSettings

An immutable pydantic model to validate EPA CEMS settings.

Parameters:
  • data_source – DataSource metadata object

  • years – list of years to validate.

  • states – list of states to validate.

  • tables – list of tables to validate.

  • partition – Whether to output year-state partitioned Parquet files. If True, all available threads / CPUs will be used in parallel.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
states :list[str][source]#
tables :list[str][source]#
partition :bool = False[source]#
allow_all_keyword(states)[source]#

Allow users to specify [‘all’] to get all states.

class pudl.settings.Eia923Settings[source]#

Bases: GenericDatasetSettings

An immutable pydantic model to validate EIA 923 settings.

Parameters:
  • data_source – DataSource metadata object

  • years – list of years to validate.

  • tables – list of tables to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
tables :list[str][source]#
class pudl.settings.Eia861Settings[source]#

Bases: GenericDatasetSettings

An immutable pydantic model to validate EIA 861 settings.

Parameters:
  • data_source – DataSource metadata object

  • years – list of years to validate.

  • tables – list of tables to validate.

  • transform_functions – list of transform functions to be applied to eia861

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
tables :list[str][source]#
transform_functions :list[str][source]#
generate_transform_functions(values)[source]#

Map tables to transform functions.

Parameters:

values – eia861 settings.

Returns:

eia861 settings.

Return type:

values

class pudl.settings.Eia860Settings[source]#

Bases: GenericDatasetSettings

An immutable pydantic model to validate EIA 860 settings.

This model also check 860m settings.

Parameters:
  • data_source – DataSource metadata object

  • years – list of years to validate.

  • tables – list of tables to validate.

  • ClassVar[str] (eia860m_date) – The 860m year to date.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
eia860m_data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
eia860m_date :ClassVar[str][source]#
years :list[int][source]#
tables :list[str][source]#
eia860m :bool = True[source]#
check_eia860m_date(eia860m: bool) bool[source]#

Check 860m date-year is exactly one year after most recent working 860 year.

Parameters:

eia860m – True if 860m is requested.

Returns:

True if 860m is requested.

Return type:

eia860m

Raises:

ValueError – the 860m date is within 860 working years.

class pudl.settings.GlueSettings[source]#

Bases: BaseModel

An immutable pydantic model to validate Glue settings.

Parameters:
  • eia – Include eia in glue settings.

  • ferc1 – Include ferc1 in glue settings.

eia :bool = True[source]#
ferc1 :bool = True[source]#
class pudl.settings.EiaSettings[source]#

Bases: BaseModel

An immutable pydantic model to validate EIA datasets settings.

Parameters:
  • eia860 – Immutable pydantic model to validate eia860 settings.

  • eia923 – Immutable pydantic model to validate eia923 settings.

eia860 :Eia860Settings[source]#
eia923 :Eia923Settings[source]#
default_load_all(values)[source]#

If no datasets are specified default to all.

Parameters:

values (Dict[str, BaseModel]) – dataset settings.

Returns:

dataset settings.

Return type:

values (Dict[str, BaseModel])

check_eia_dependencies(values)[source]#

Make sure the dependencies between the eia datasets are satisfied.

Dependencies: * eia860 requires eia923.boiler_fuel_eia923 and eia923.generation_eia923. * eia923 requires eia860 for harvesting purposes.

Parameters:

values (Dict[str, BaseModel]) – dataset settings.

Returns:

dataset settings.

Return type:

values (Dict[str, BaseModel])

class pudl.settings.DatasetsSettings[source]#

Bases: BaseModel

An immutable pydantic model to validate PUDL Dataset settings.

Parameters:
  • ferc1 – Immutable pydantic model to validate ferc1 settings.

  • eia – Immutable pydantic model to validate eia(860, 923) settings.

  • glue – Immutable pydantic model to validate glue settings.

  • epacems – Immutable pydantic model to validate epacems settings.

ferc1 :Ferc1Settings[source]#
eia :EiaSettings[source]#
glue :GlueSettings[source]#
epacems :EpaCemsSettings[source]#
default_load_all(values)[source]#

If no datasets are specified default to all.

Parameters:

values (Dict[str, BaseModel]) – dataset settings.

Returns:

dataset settings.

Return type:

values (Dict[str, BaseModel])

add_glue_settings(values)[source]#

Add glue settings if ferc1 and eia data are both requested.

Parameters:

values (Dict[str, BaseModel]) – dataset settings.

Returns:

dataset settings.

Return type:

values (Dict[str, BaseModel])

get_datasets()[source]#

Gets dictionary of dataset settings.

make_datasources_table(ds: pudl.workspace.datastore.Datastore) pandas.DataFrame[source]#

Compile a table of dataset information.

There are three places we can look for information about a dataset: * the datastore (for DOIs, working partitions, etc) * the ETL settings (for partitions that are used in the ETL) * the DataSource info (which is stored within the ETL settings)

The ETL settings and the datastore have different levels of nesting - and therefor names for datasets. The nesting happens particularly with the EIA data. There are three EIA datasets right now - eia923, eia860 and eia860m. eia860m is a monthly update of a few tables in the larger eia860 dataset.

Parameters:

ds – An initalized PUDL Datastore from which the DOI’s for each raw input dataset can be obtained.

Returns:

a dataframe describing the partitions and DOI’s of each of the datasets in this settings object.

class pudl.settings.Ferc1DbfToSqliteSettings[source]#

Bases: GenericDatasetSettings

An immutable Pydantic model to validate FERC 1 to SQLite settings.

Parameters:
  • tables – List of tables to validate.

  • years – List of years to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
tables :list[str][source]#
refyear :ClassVar[int][source]#
validate_tables(tables)[source]#

Validate tables.

class pudl.settings.FercGenericXbrlToSqliteSettings[source]#

Bases: pydantic.BaseSettings

An immutable pydantic model to validate Ferc1 to SQLite settings.

Parameters:
  • taxonomy – URL of XBRL taxonomy used to create structure of SQLite DB.

  • tables – list of tables to validate.

  • years – list of years to validate.

taxonomy :pydantic.AnyHttpUrl[source]#
tables :list[int] | None[source]#
years :list[int][source]#
class pudl.settings.Ferc1XbrlToSqliteSettings[source]#

Bases: FercGenericXbrlToSqliteSettings

An immutable pydantic model to validate Ferc1 to SQLite settings.

Parameters:
  • taxonomy – URL of taxonomy used to .

  • years – list of years to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
taxonomy :pydantic.AnyHttpUrl = https://eCollection.ferc.gov/taxonomy/form1/2022-01-01/form/form1/form-1_2022-01-01.xsd[source]#
tables :list[str][source]#
validate_tables(tables)[source]#

Validate tables.

class pudl.settings.Ferc2XbrlToSqliteSettings[source]#

Bases: FercGenericXbrlToSqliteSettings

An immutable pydantic model to validate FERC from 2 XBRL to SQLite settings.

Parameters:

years – List of years to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
taxonomy :pydantic.AnyHttpUrl = https://eCollection.ferc.gov/taxonomy/form2/2022-01-01/form/form2/form-2_2022-01-01.xsd[source]#
class pudl.settings.Ferc6XbrlToSqliteSettings[source]#

Bases: FercGenericXbrlToSqliteSettings

An immutable pydantic model to validate FERC from 6 XBRL to SQLite settings.

Parameters:

years – List of years to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
taxonomy :pydantic.AnyHttpUrl = https://eCollection.ferc.gov/taxonomy/form6/2022-01-01/form/form6/form-6_2022-01-01.xsd[source]#
class pudl.settings.Ferc60XbrlToSqliteSettings[source]#

Bases: FercGenericXbrlToSqliteSettings

An immutable pydantic model to validate FERC from 60 XBRL to SQLite settings.

Parameters:

years – List of years to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int][source]#
taxonomy :pydantic.AnyHttpUrl = https://eCollection.ferc.gov/taxonomy/form60/2022-01-01/form/form60/form-60_2022-01-01.xsd[source]#
class pudl.settings.Ferc714XbrlToSqliteSettings[source]#

Bases: FercGenericXbrlToSqliteSettings

An immutable pydantic model to validate FERC from 714 XBRL to SQLite settings.

Parameters:

years – List of years to validate.

data_source :ClassVar[pudl.metadata.classes.DataSource][source]#
years :list[int] = [2021][source]#
taxonomy :pydantic.AnyHttpUrl = https://eCollection.ferc.gov/taxonomy/form714/2022-01-01/form/form714/form-714_2022-01-01.xsd[source]#
class pudl.settings.FercToSqliteSettings[source]#

Bases: pydantic.BaseSettings

An immutable pydantic model to validate FERC XBRL to SQLite settings.

Parameters:
  • ferc1_dbf_to_sqlite_settings – Settings for converting FERC 1 DBF data to SQLite.

  • ferc1_xbrl_to_sqlite_settings – Settings for converting FERC 1 XBRL data to SQLite.

  • other_xbrl_forms – List of non-FERC1 forms to convert from XBRL to SQLite.

ferc1_dbf_to_sqlite_settings :Ferc1DbfToSqliteSettings[source]#
ferc1_xbrl_to_sqlite_settings :Ferc1XbrlToSqliteSettings[source]#
ferc2_xbrl_to_sqlite_settings :Ferc2XbrlToSqliteSettings[source]#
ferc6_xbrl_to_sqlite_settings :Ferc6XbrlToSqliteSettings[source]#
ferc60_xbrl_to_sqlite_settings :Ferc60XbrlToSqliteSettings[source]#
ferc714_xbrl_to_sqlite_settings :Ferc714XbrlToSqliteSettings[source]#
get_xbrl_dataset_settings(form_number: XbrlFormNumber) FercGenericXbrlToSqliteSettings[source]#

Return a list with all requested FERC XBRL to SQLite datasets.

Parameters:

form_number – Get settings by FERC form number.

class pudl.settings.EtlSettings[source]#

Bases: pydantic.BaseSettings

Main settings validation class.

ferc_to_sqlite_settings :FercToSqliteSettings[source]#
datasets :DatasetsSettings[source]#
name :str[source]#
title :str[source]#
description :str[source]#
version :str[source]#
pudl_in :str[source]#
pudl_out :str[source]#
classmethod from_yaml(path: str) EtlSettings[source]#

Create an EtlSettings instance from a yaml_file path.

Parameters:

path – path to a yaml file.

Returns:

An ETL settings object.

pudl.settings._make_doi_clickable(link)[source]#

Make a clickable DOI.