pudl.etl#

Dagster definitions for the PUDL ETL and Output tables.

Submodules#

Package Contents#

Classes#

EtlSettings

Main settings validation class.

Functions#

epacems_io_manager(→ EpaCemsIOManager)

IO Manager that writes EPA CEMS partitions to individual parquet files.

ferc1_dbf_sqlite_io_manager(→ FercDBFSQLiteIOManager)

Create a SQLiteManager dagster resource for the ferc1 dbf database.

ferc1_xbrl_sqlite_io_manager(→ FercXBRLSQLiteIOManager)

Create a SQLiteManager dagster resource for the ferc1 dbf database.

pudl_mixed_format_io_manager(→ dagster.IOManager)

Create a SQLiteManager dagster resource for the pudl database.

dataset_settings(→ pudl.settings.DatasetsSettings)

Dagster resource for parameterizing PUDL ETL assets.

datastore(→ pudl.workspace.datastore.Datastore)

Dagster resource to interact with Zenodo archives.

ferc_to_sqlite_settings(...)

Dagster resource for parameterizing the ferc_to_sqlite graph.

asset_check_from_schema(...)

Create a dagster asset check based on the resource schema, if defined.

_get_keys_from_assets(→ list[dagster.AssetKey])

Get a list of asset keys.

create_non_cems_selection(→ dagster.AssetSelection)

Create a selection of assets excluding CEMS and all downstream assets.

load_dataset_settings_from_file(→ dict)

Load dataset settings from a settings file in pudl.package_data.settings.

Attributes#

pudl.etl.epacems_io_manager(init_context: dagster.InitResourceContext) EpaCemsIOManager[source]#

IO Manager that writes EPA CEMS partitions to individual parquet files.

pudl.etl.ferc1_dbf_sqlite_io_manager(init_context) FercDBFSQLiteIOManager[source]#

Create a SQLiteManager dagster resource for the ferc1 dbf database.

pudl.etl.ferc1_xbrl_sqlite_io_manager(init_context) FercXBRLSQLiteIOManager[source]#

Create a SQLiteManager dagster resource for the ferc1 dbf database.

pudl.etl.pudl_mixed_format_io_manager(init_context) dagster.IOManager[source]#

Create a SQLiteManager dagster resource for the pudl database.

pudl.etl.dataset_settings(init_context) pudl.settings.DatasetsSettings[source]#

Dagster resource for parameterizing PUDL ETL assets.

This resource allows us to specify the years we want to process for each datasource in the Dagit UI.

pudl.etl.datastore(init_context) pudl.workspace.datastore.Datastore[source]#

Dagster resource to interact with Zenodo archives.

pudl.etl.ferc_to_sqlite_settings(init_context) pudl.settings.FercToSqliteSettings[source]#

Dagster resource for parameterizing the ferc_to_sqlite graph.

This resource allows us to specify the years we want to process for each datasource in the Dagit UI.

class pudl.etl.EtlSettings(_case_sensitive: bool | None = None, _env_prefix: str | None = None, _env_file: pydantic_settings.sources.DotenvType | None = ENV_FILE_SENTINEL, _env_file_encoding: str | None = None, _env_ignore_empty: bool | None = None, _env_nested_delimiter: str | None = None, _env_parse_none_str: str | None = None, _secrets_dir: str | pathlib.Path | None = None, **values: Any)[source]#

Bases: pydantic_settings.BaseSettings

Main settings validation class.

ferc_to_sqlite_settings: FercToSqliteSettings | None#
datasets: DatasetsSettings | None#
name: str | None#
title: str | None#
description: str | None#
version: str | None#
publish_destinations: list[str] = []#
classmethod from_yaml(path: str) EtlSettings[source]#

Create an EtlSettings instance from a yaml_file path.

Parameters:

path – path to a yaml file; this could be remote.

Returns:

An ETL settings object.

pudl.etl.logger[source]#
pudl.etl.default_assets = ()[source]#
pudl.etl.asset_check_modules[source]#
pudl.etl.default_asset_checks[source]#
pudl.etl.asset_check_from_schema(asset_key: dagster.AssetKey, package: pudl.metadata.classes.Package) dagster.AssetChecksDefinition | None[source]#

Create a dagster asset check based on the resource schema, if defined.

pudl.etl._get_keys_from_assets(asset_def: dagster.AssetsDefinition | dagster.SourceAsset | dagster._core.definitions.cacheable_assets.CacheableAssetsDefinition) list[dagster.AssetKey][source]#

Get a list of asset keys.

Most assets have one key, which can be retrieved as a list from asset.keys.

Multi-assets have multiple keys, which can also be retrieved as a list from asset.keys.

SourceAssets always only have one key, and don’t have asset.keys. So we look for asset.key and wrap it in a list.

We don’t handle CacheableAssetsDefinitions yet.

pudl.etl._package[source]#
pudl.etl._asset_keys[source]#
pudl.etl.default_resources[source]#
pudl.etl.default_tag_concurrency_limits[source]#
pudl.etl.default_config[source]#
pudl.etl.create_non_cems_selection(all_assets: list[dagster.AssetsDefinition]) dagster.AssetSelection[source]#

Create a selection of assets excluding CEMS and all downstream assets.

Parameters:

all_assets – A list of asset definitions to remove CEMS assets from.

Returns:

An asset selection with all_assets assets excluding CEMS assets.

pudl.etl.load_dataset_settings_from_file(setting_filename: str) dict[source]#

Load dataset settings from a settings file in pudl.package_data.settings.

Parameters:

setting_filename – name of settings file.

Returns:

Dictionary of dataset settings.

pudl.etl.defs: dagster.Definitions[source]#

A collection of dagster assets, resources, IO managers, and jobs for the PUDL ETL.