pudl.workspace.setup

Tools for setting up and managing PUDL workspaces.

Module Contents

Functions

set_defaults(pudl_in, pudl_out, clobber=False)

Set default user input and output locations in $HOME/.pudl.yml.

get_defaults()

Read paths to default PUDL input/output dirs from user's $HOME/.pudl.yml.

derive_paths(pudl_in, pudl_out)

Derive PUDL paths based on given input and output paths.

init(pudl_in, pudl_out, clobber=False)

Set up a new PUDL working environment based on the user settings.

deploy(pkg_path, deploy_dir, ignore_files, clobber=False)

Deploy all files from a package_data directory into a workspace.

Attributes

logger

pudl.workspace.setup.logger[source]
pudl.workspace.setup.set_defaults(pudl_in, pudl_out, clobber=False)[source]

Set default user input and output locations in $HOME/.pudl.yml.

Create a user settings file for future reference, that defines the default PUDL input and output directories. If this file already exists, behavior depends on the clobber parameter, which is False by default. If it’s True, the existing file is replaced. If False, the existing file is not changed.

Parameters
  • pudl_in (os.PathLike) – Path to be used as the default input directory for PUDL – this is where pudl.workspace.datastore will look to find the data directory, full of data from public agencies.

  • pudl_out (os.PathLike) – Path to the default output directory for PUDL, where results of data processing will be organized.

  • clobber (bool) – If True and a user settings file exists, overwrite it. If False, do not alter the existing file. Defaults to False.

Returns

None

pudl.workspace.setup.get_defaults()[source]

Read paths to default PUDL input/output dirs from user’s $HOME/.pudl.yml.

Parameters

None

Returns

The contents of the user’s PUDL settings file, with keys pudl_in and pudl_out defining their default PUDL workspace. If the $HOME/.pudl.yml file does not exist, set these paths to None.

Return type

dict

pudl.workspace.setup.derive_paths(pudl_in, pudl_out)[source]

Derive PUDL paths based on given input and output paths.

If no configuration file path is provided, attempt to read in the user configuration from a file called .pudl.yml in the user’s HOME directory. Presently the only values we expect are pudl_in and pudl_out, directories that store files that PUDL either depends on that rely on PUDL.

Parameters
  • pudl_in (os.PathLike) – Path to the directory containing the PUDL input files, most notably the data directory which houses the raw data downloaded from public agencies by the pudl.workspace.datastore tools. pudl_in may be the same directory as pudl_out.

  • pudl_out (os.PathLike) – Path to the directory where PUDL should write the outputs it generates. These will be organized into directories according to the output format (sqlite, parquet, etc.).

Returns

A dictionary containing common PUDL settings, derived from those

read out of the YAML file. Mostly paths for inputs & outputs.

Return type

dict

pudl.workspace.setup.init(pudl_in, pudl_out, clobber=False)[source]

Set up a new PUDL working environment based on the user settings.

Parameters
  • pudl_in (os.PathLike) – Path to the directory containing the PUDL input files, most notably the data directory which houses the raw data downloaded from public agencies by the pudl.workspace.datastore tools. pudl_in may be the same directory as pudl_out.

  • pudl_out (os.PathLike) – Path to the directory where PUDL should write the outputs it generates. These will be organized into directories according to the output format (sqlite, parquet, etc.).

  • clobber (bool) – if True, replace existing files. If False (the default) do not replace existing files.

Returns

None

pudl.workspace.setup.deploy(pkg_path, deploy_dir, ignore_files, clobber=False)[source]

Deploy all files from a package_data directory into a workspace.

Parameters
  • pkg_path (str) – Dotted module path to the subpackage inside of package_data containing the resources to be deployed.

  • deploy_dir (os.PathLike) – Directory on the filesystem to which the files within pkg_path should be deployed.

  • ignore_files (iterable) – List of filenames (strings) that may be present in the pkg_path subpackage, but that should be ignored.

  • clobber (bool) – if True, replace existing copies of the files that are being deployed from pkg_path to deploy_dir. If False, do not replace existing files.

Returns

None