pudl.cli.etl#

A command line interface (CLI) to the main PUDL ETL functionality.

This script cordinates the PUDL ETL process, based on parameters provided via a YAML settings file.

If the settings for a dataset has empty parameters (meaning there are no years or tables included), no outputs will be generated. See Running the ETL Pipeline for details.

The output SQLite and Parquet files will be stored in PUDL_OUTPUT. To setup your default PUDL_INPUT and PUDL_OUTPUT directories see pudl_setup --help.

Module Contents#

Functions#

parse_command_line(argv)

Parse script command line arguments. See the -h option.

pudl_etl_job_factory(→ collections.abc.Callable[[], ...)

Factory for parameterizing a reconstructable pudl_etl job.

main()

Parse command line and initialize PUDL DB.

Attributes#

pudl.cli.etl.logger[source]#
pudl.cli.etl.parse_command_line(argv)[source]#

Parse script command line arguments. See the -h option.

Parameters:

argv (list) – command line arguments including caller file name.

Returns:

A dictionary mapping command line arguments to their values.

Return type:

dict

pudl.cli.etl.pudl_etl_job_factory(logfile: str | None = None, loglevel: str = 'INFO', process_epacems: bool = True) collections.abc.Callable[[], dagster.JobDefinition][source]#

Factory for parameterizing a reconstructable pudl_etl job.

Parameters:
  • loglevel – The log level for the job’s execution.

  • logfile – Path to a log file for the job’s execution.

  • process_epacems – Include EPA CEMS assets in the job execution.

Returns:

The job definition to be executed.

pudl.cli.etl.main()[source]#

Parse command line and initialize PUDL DB.