pudl.etl
#
Dagster definitions for the PUDL ETL and Output tables.
Submodules#
Package Contents#
Classes#
Main settings validation class. |
Functions#
|
IO Manager that writes EPA CEMS partitions to individual parquet files. |
|
Create a SQLiteManager dagster resource for the ferc1 dbf database. |
|
Create a SQLiteManager dagster resource for the ferc1 dbf database. |
|
Create a SQLiteManager dagster resource for the pudl database. |
|
Dagster resource for parameterizing PUDL ETL assets. |
|
Dagster resource to interact with Zenodo archives. |
Dagster resource for parameterizing the |
|
Create a dagster asset check based on the resource schema, if defined. |
|
|
Get a list of asset keys. |
|
Create a selection of assets excluding CEMS and all downstream assets. |
|
Load dataset settings from a settings file in pudl.package_data.settings. |
Attributes#
- pudl.etl.epacems_io_manager(init_context: dagster.InitResourceContext) EpaCemsIOManager [source]#
IO Manager that writes EPA CEMS partitions to individual parquet files.
- pudl.etl.ferc1_dbf_sqlite_io_manager(init_context) FercDBFSQLiteIOManager [source]#
Create a SQLiteManager dagster resource for the ferc1 dbf database.
- pudl.etl.ferc1_xbrl_sqlite_io_manager(init_context) FercXBRLSQLiteIOManager [source]#
Create a SQLiteManager dagster resource for the ferc1 dbf database.
- pudl.etl.pudl_mixed_format_io_manager(init_context) dagster.IOManager [source]#
Create a SQLiteManager dagster resource for the pudl database.
- pudl.etl.dataset_settings(init_context) pudl.settings.DatasetsSettings [source]#
Dagster resource for parameterizing PUDL ETL assets.
This resource allows us to specify the years we want to process for each datasource in the Dagit UI.
- pudl.etl.datastore(init_context) pudl.workspace.datastore.Datastore [source]#
Dagster resource to interact with Zenodo archives.
- pudl.etl.ferc_to_sqlite_settings(init_context) pudl.settings.FercToSqliteSettings [source]#
Dagster resource for parameterizing the
ferc_to_sqlite
graph.This resource allows us to specify the years we want to process for each datasource in the Dagit UI.
- class pudl.etl.EtlSettings(_case_sensitive: bool | None = None, _env_prefix: str | None = None, _env_file: pydantic_settings.sources.DotenvType | None = ENV_FILE_SENTINEL, _env_file_encoding: str | None = None, _env_ignore_empty: bool | None = None, _env_nested_delimiter: str | None = None, _env_parse_none_str: str | None = None, _secrets_dir: str | pathlib.Path | None = None, **values: Any)[source]#
Bases:
pydantic_settings.BaseSettings
Main settings validation class.
- ferc_to_sqlite_settings: FercToSqliteSettings | None#
- datasets: DatasetsSettings | None#
- classmethod from_yaml(path: str) EtlSettings [source]#
Create an EtlSettings instance from a yaml_file path.
- Parameters:
path – path to a yaml file; this could be remote.
- Returns:
An ETL settings object.
- pudl.etl.asset_check_from_schema(asset_key: dagster.AssetKey, package: pudl.metadata.classes.Package) dagster.AssetChecksDefinition | None [source]#
Create a dagster asset check based on the resource schema, if defined.
- pudl.etl._get_keys_from_assets(asset_def: dagster.AssetsDefinition | dagster.SourceAsset | dagster._core.definitions.cacheable_assets.CacheableAssetsDefinition) list[dagster.AssetKey] [source]#
Get a list of asset keys.
Most assets have one key, which can be retrieved as a list from
asset.keys
.Multi-assets have multiple keys, which can also be retrieved as a list from
asset.keys
.SourceAssets always only have one key, and don’t have
asset.keys
. So we look forasset.key
and wrap it in a list.We don’t handle CacheableAssetsDefinitions yet.
- pudl.etl.create_non_cems_selection(all_assets: list[dagster.AssetsDefinition]) dagster.AssetSelection [source]#
Create a selection of assets excluding CEMS and all downstream assets.
- Parameters:
all_assets – A list of asset definitions to remove CEMS assets from.
- Returns:
An asset selection with all_assets assets excluding CEMS assets.
- pudl.etl.load_dataset_settings_from_file(setting_filename: str) dict [source]#
Load dataset settings from a settings file in pudl.package_data.settings.
- Parameters:
setting_filename – name of settings file.
- Returns:
Dictionary of dataset settings.
- pudl.etl.defs: dagster.Definitions[source]#