Set up a well-organized PUDL data management workspace.
This script creates a well-defined directory structure for use by the PUDL package, and copies several example settings files and Jupyter notebooks into it to get you started. If the command is run without any arguments, it will create this workspace in your current directory.
The script will also create a file named .pudl.yml, describing the location of your PUDL workspace. The PUDL package will refer to this location in the future to know where it should look for raw data, where to put its outputs, etc. This file can be edited to change the default input and output directories if you wish. However, make sure those workspaces are set up using this script.
It’s also possible to specify different input and output directories, which is useful if you want to use a single PUDL data store (which may contain many GB of data) to support several different workspaces. See the –pudl_in and –pudl_out options.
By default the script will not overwrite existing files. If you want it to replace existing files (including your .pudl.yml file which defines your default PUDL workspace) use the –clobber option.
The directory structure set up for PUDL looks like this:
- └── data
├── censusdp1tract ├── eia860 ├── eia860m ├── eia861 ├── eia923 ├── epacems ├── ferc1 ├── ferc714 └── tmp
├── parquet ├── settings └── sqlite
Initially, the directories in the data store will be empty. The pudl_datastore or pudl_etl commands will download data from public sources and organize it for you there by source. The PUDL_OUT directories are organized by the type of file they contain.
Parse command line arguments for the pudl_setup script.
Set up a new default PUDL workspace.