pudl.etl
#
Dagster definitions for the PUDL ETL and Output tables.
Submodules#
Package Contents#
Classes#
Main settings validation class. |
Functions#
|
IO Manager that writes EPA CEMS partitions to individual parquet files. |
|
Create a SQLiteManager dagster resource for the ferc1 dbf database. |
|
Create a SQLiteManager dagster resource for the ferc1 dbf database. |
|
Create a SQLiteManager dagster resource for the pudl database. |
|
Dagster resource for parameterizing PUDL ETL assets. |
|
Dagster resource to interact with Zenodo archives. |
Dagster resource for parameterizing the |
|
Create a dagster asset check based on the resource schema, if defined. |
|
|
Get a list of asset keys. |
|
Create a selection of assets excluding CEMS and all downstream assets. |
|
Load dataset settings from a settings file in pudl.package_data.settings. |
Attributes#
Define a gobal PUDL package object for use across the entire codebase. |
|
A collection of dagster assets, resources, IO managers, and jobs for the PUDL ETL. |
- pudl.etl.epacems_io_manager(init_context: dagster.InitResourceContext) EpaCemsIOManager [source]#
IO Manager that writes EPA CEMS partitions to individual parquet files.
- pudl.etl.ferc1_dbf_sqlite_io_manager(init_context) FercDBFSQLiteIOManager [source]#
Create a SQLiteManager dagster resource for the ferc1 dbf database.
- pudl.etl.ferc1_xbrl_sqlite_io_manager(init_context) FercXBRLSQLiteIOManager [source]#
Create a SQLiteManager dagster resource for the ferc1 dbf database.
- pudl.etl.pudl_mixed_format_io_manager(init_context) dagster.IOManager [source]#
Create a SQLiteManager dagster resource for the pudl database.
- pudl.etl.PUDL_PACKAGE[source]#
Define a gobal PUDL package object for use across the entire codebase.
This needs to happen after the definition of the Package class above, and it is used in some of the class definitions below, but having it defined in the middle of this module is kind of obscure, so it is imported in the __init__.py for this subpackage and then imported in other modules from that more prominent location.
- pudl.etl.dataset_settings(init_context) pudl.settings.DatasetsSettings [source]#
Dagster resource for parameterizing PUDL ETL assets.
This resource allows us to specify the years we want to process for each datasource in the Dagit UI.
- pudl.etl.datastore(init_context) pudl.workspace.datastore.Datastore [source]#
Dagster resource to interact with Zenodo archives.
- pudl.etl.ferc_to_sqlite_settings(init_context) pudl.settings.FercToSqliteSettings [source]#
Dagster resource for parameterizing the
ferc_to_sqlite
graph.This resource allows us to specify the years we want to process for each datasource in the Dagit UI.
- class pudl.etl.EtlSettings(_case_sensitive: bool | None = None, _env_prefix: str | None = None, _env_file: pydantic_settings.sources.DotenvType | None = ENV_FILE_SENTINEL, _env_file_encoding: str | None = None, _env_ignore_empty: bool | None = None, _env_nested_delimiter: str | None = None, _env_parse_none_str: str | None = None, _secrets_dir: str | pathlib.Path | None = None, **values: Any)[source]#
Bases:
pydantic_settings.BaseSettings
Main settings validation class.
- ferc_to_sqlite_settings: FercToSqliteSettings | None#
- datasets: DatasetsSettings | None#
- classmethod from_yaml(path: str) EtlSettings [source]#
Create an EtlSettings instance from a yaml_file path.
- Parameters:
path – path to a yaml file; this could be remote.
- Returns:
An ETL settings object.
- pudl.etl.asset_check_from_schema(asset_key: dagster.AssetKey, package: pudl.metadata.classes.Package) dagster.AssetChecksDefinition | None [source]#
Create a dagster asset check based on the resource schema, if defined.
- pudl.etl._get_keys_from_assets(asset_def: dagster.AssetsDefinition | dagster.SourceAsset | dagster._core.definitions.cacheable_assets.CacheableAssetsDefinition) list[dagster.AssetKey] [source]#
Get a list of asset keys.
Most assets have one key, which can be retrieved as a list from
asset.keys
.Multi-assets have multiple keys, which can also be retrieved as a list from
asset.keys
.SourceAssets always only have one key, and don’t have
asset.keys
. So we look forasset.key
and wrap it in a list.We don’t handle CacheableAssetsDefinitions yet.
- pudl.etl.create_non_cems_selection(all_assets: list[dagster.AssetsDefinition]) dagster.AssetSelection [source]#
Create a selection of assets excluding CEMS and all downstream assets.
- Parameters:
all_assets – A list of asset definitions to remove CEMS assets from.
- Returns:
An asset selection with all_assets assets excluding CEMS assets.
- pudl.etl.load_dataset_settings_from_file(setting_filename: str) dict [source]#
Load dataset settings from a settings file in pudl.package_data.settings.
- Parameters:
setting_filename – name of settings file.
- Returns:
Dictionary of dataset settings.
- pudl.etl.defs: dagster.Definitions[source]#
A collection of dagster assets, resources, IO managers, and jobs for the PUDL ETL.