We use Tox to coordinate our software testing
and to manage other build and sanity checking tools. Under the hood, it invokes
a variety of other collections of command-line tools in predefined combinations
that are described in
tox.ini. These include software tests defined using
pytest, code linters like
generators like Sphinx, and sanity checks defined as git pre-commit hooks. Each
of these tools, or sometimes collections of related tools, can be selected at
the command line. They can also be run independently without using Tox, but for
the sake of simplicitly and standardization, we try to mostly just run them
using the predefined settings we have configured in Tox.
The simplest way to test PUDL – which is also how the code is tested automatically by our continuous integration setup – is to just run Tox alone with no arguments. This will typically take 25 minutes to run.
If you aren’t familiar with pytest and Tox already, you may want to go peruse their introductory documentation.
pytest based software tests are all stored under the
directory in the main repository. They are organized into 3 broad categories,
each with its own subdirectory:
Software Unit Tests (
test/unit/) can be run in seconds and don’t require any external data. They test the basic functionality of various functions and classes, often using minimal inline data structures that are specified in the test modules themselves.
Software Integration Tests (
test/integration/) test larger collections of functionality including the interactions between different parts of the overall software system and in some cases interactions with external systems requiring network connectivity. The main thing our integration tests do is run the full PUDL data processing pipeline for the most recent year of data. This takes around 15 minutes.
Data Validations (
test/validate/) sanity check the PUDL outputs generated by the data processing pipeline. This helps us catch issues with the input data as well as more subtle bugs that don’t prevent the code from executing but do have unintended or unexpected impacts on the output data. The data validation requires a fully populated PUDL database and is quite different from the other tests.
Running tests with Tox
Tox installs the PUDL package in a fresh Python environment, ensuring that the
tests only have access to packages which would be installed on a new user’s
computer. Tox’s overall behavior is configured with the
tox.ini file in the
main repository directory. There are several different “test environments”
defined to test different aspects of the software or to perform other
actions like building the documentation. We’ll go through some of the most
common ones below.
Continuous Integration Tests
Our default tox test environment is
ci – that includes all of the tests
that will be run in continuous integration using a GitHub Action. You should run these tests before
pushing code to the repository or making a pull request. Because it’s the
default test environment, it will be run if you call Tox without any
This is equivalent to:
$ tox -e ci
If the PUDL package’s dependencies have been changed (in
setup.py) or you
recently ran the tests while on another branch of the repository with other
dependencies, you may need to tell Tox to recreate the software environment
it uses with the
-r flag. This behavior is turned on by default for the
validate tests since they take a long time to run
and the extra time required to recreate the software environment is short by
You will need to register for an EIA API key to run the
tests which are included as part of the
ci tests. We use data from the
EIA API to fill in missing monthly fuel costs in the marginal cost of
electricity calculations. Once you have the API key, you’ll need to store it
in an environment variable named
API_KEY_EIA within the shell where you
are running the tests. You may want to add it to your
.zshrc so that it’s automatically available to PUDL in the future. There
are many tutorials on how to manage environment variables online. Here’s one
tutorial from Digital Ocean.
In addition to running the
integration tests, the CI test
environment lints the code and documentation input files and uses Sphinx to
build the documentation. It also generates a test coverage report. Running
the full set of CI tests takes 20-25 minutes and requires a fair amount of
data. If you don’t already have that data downloaded, it will be downloaded
automatically and put in your local datastore
Locally the tests will run using whatever version of Python is part of your
pudl-dev conda environment, but we have our CI set up to test on both
Python 3.8 and 3.9 in parallel.
Software Unit and Integration Tests
To run the
integration tests on their own, you use the
flag to choose those test environments explicitly:
$ tox -e unit
$ tox -e integration
Full ETL Tests
As mentioned above, the CI tests process a single year of data. If you would
like to more exhaustively test the ETL process without affecting your
existing FERC 1 and PUDL databases, you can use the
environment which may take close to an hour to run:
$ tox -e full
This will process all years of data for the EIA and FERC datasets and all
years of EPA CEMS data for a single state (Idaho). The ETL parameters for
this test are defined in
Running Other Commands with Tox
You can run any of the individual test environments that
tox -av lists on
$ tox -av default environments: ci -> Run all continuous integration (CI) checks & generate test coverage. additional environments: flake8 -> Run the full suite of flake8 linters on the PUDL codebase. pre_commit -> Run git pre-commit hooks not covered by the other linters. bandit -> Check the PUDL codebase for common insecure code patterns. linters -> Run the pre-commit, flake8, and bandit linters. doc8 -> Check the documentation input files for syntactical correctness. docs -> Remove old docs output and rebuild HTML from scratch with Sphinx unit -> Run all the software unit tests. ferc1_solo -> Test whether FERC 1 can be loaded into the PUDL database alone. integration -> Run all software integration tests and process a full year of data. validate -> Run all data validation tests. This requires a complete PUDL DB. ferc1_schema -> Verify FERC Form 1 DB schema are compatible for all years. full_integration -> Run ETL and integration tests for all years and data sources. full -> Run all CI checks, but for all years of data. build -> Prepare Python source and binary packages for release. testrelease -> Do a dry run of Python package release using the PyPI test server. release -> Release the PUDL package to the production PyPI server.
Note that not all of them literally run tests. For instance, to lint and build the documentation you can run:
$ tox -e docs
To run all of the code and documentation linters, but not run any of the other tests:
$ tox -e linters
Each of the test environments defined in
tox.ini is just a collection of
dependencies and commands. To see what they consist of, you can open the file
in your text editor. Each section starts with
[testenv:xxxxxx] and the
commands is a list of shell commands that that test
environment will run.
Selecting Input Data for Integration Tests
The software integration tests need a year’s worth of input data to process. By default they will look in your local PUDL datastore to find it. If the data they need isn’t available locally, they will download it from Zenodo and put it in the local datastore.
However, if you’re editing code that affects how the datastore works, you
probably don’t want to risk contaminating your working datastore. You can
use a disposable temporary datastore instead by having Tox pass the
--tmp-data flag in to
pytest like this:
$ tox -e integration -- --tmp-data
-- isn’t a typo, it tells Tox that you’re done giving it
command line arguments, and that any additional arguments it gets should be
passed through to
pytest. We’ve configured
pytest (through the
test/conftest.py configuration file) to be on the lookout for the
--tmp-data flag and act accordingly.
Given the processed outputs of the PUDL ETL pipeline, we have a collection of tests that can be run to verify that the outputs look correct. We run all available data validations before each data release is archived on Zenodo. It is useful to run the data validation tests prior to making a pull request that makes changes to the ETL process or output functions to ensure that the outputs have not been unintentionally affected.
These data validation tests are organized into datasource specific modules
test/validate. Running the full data validation can take as much as
an hour, depending on your computer. These tests require a fully populated
PUDL database which contains all available FERC and EIA data, as specified by
src/pudl/package_data/settings/etl_full.yml input file. They are run
against the “live” SQLite database in your pudl workspace at
sqlite/pudl.sqlite. To run the full data validation against an existing
$ tox -e validate
The data validation cases that pertain to the contents of the data tables are
currently stored as part of the
The expected number of records in each output table is stored in the validation
test modules under
test/validate as pytest parameterizations.
Data Validation Notebooks
We have a collection of Jupyter Notebooks that run the same functions as the
data validation. The notebooks also produce some visualizations of the data
to make it easier to understand what’s wrong when validation fails. These
notebooks are stored in
Like the data validations, the notebooks will only run successfully when there’s a full PUDL SQLite database available in your PUDL workspace.
Running pytest Directly
Running tests directly with
pytest gives you the ability to run only
tests from a particular test module or even a single individual test case.
It’s also faster because there’s no testing environment to set up. Instead,
it just uses your Python environment which should be the
environment discussed in Development Setup. This is convenient if you’re
debugging something specific or developing new test cases, but it’s not as
robust as using Tox.
Running specific tests
To run the software unit tests with
pytest directly (the same set of tests
that would be run by
tox -e unit):
$ pytest test/unit
To run only the unit tests for the Excel spreadsheet extraction module:
$ pytest test/unit/extract/excel_test.py
To run only the unit tests defined by a single test class within that module:
$ pytest test/unit/extract/excel_test.py::TestGenericExtractor
Custom PUDL pytest flags
We have defined several custom flags to control pytest’s behavior when running the PUDL tests. They are mostly intended for use internally to specify the behavior we want in the high level Tox test environments.
You can always check to see what custom flags exist by running
pytest --help and looking at the
custom options section:
custom options: --live-dbs Use existing PUDL/FERC1 DBs instead of creating temporary ones. --tmp-data Download fresh input data for use with this test run only. --etl-settings=ETL_SETTINGS Path to a non-standard ETL settings file to use. --gcs-cache-path=GCS_CACHE_PATH If set, use this GCS path as a datastore cache layer. --sandbox Use raw inputs from the Zenodo sandbox server.
The main flexibility that these custom options provide is in selecting where the raw input data comes from and what data the tests should be run against. Being able to specify the tests to run and the data to run them against independently simplifies the test suite and keeps the data and tests very clearly separated.
--live-dbs option lets you use your existing FERC 1 and PUDL databases
instead of building a new database at all. This can be useful if you want to
test code that only operates on an existing database, and has nothing to do
with the construction of that database. For example, the output routines:
$ pytest --live-dbs test/integration/output_test.py
We also use this option to run the data validations.
Assuming you do want to run the ETL and build new databases as part of the test
you’re running, the contents of that database are determined by an ETL settings
file. By default, the settings file that’s used is
test/settings/integration-test.yml But it’s also possible to use a
different input file, generating a different database, and then run some
tests against that database.
For example, we test that FERC 1 data can be loaded into a PUDL database all
by itself by running the ETL tests with a settings file that includes only A
couple of FERC 1 tables for a single year. This is the
$ pytest --etl-settings=test/settings/ferc1-solo-test.yml test/integration/etl_test.py
Similarly, we use the
test/settings/full-integration-test.yml settings file
to specify an exhaustive collection of input data, and then we run a test that
checks that the database schemas extracted from all historical FERC 1 databases
are compatible with each other. This is the
$ pytest --etl-settings test/settings/full-integration-test.yml test/integration/etl_test.py::test_ferc1_schema
The raw input data that all the tests use is ultimately coming from our archives on Zenodo. However, you can optionally tell the tests to look in a different places for more rapidly accessible caches of that data and to force the download of a fresh copy (especially useful when you are testing the datastore functionality specifically). By default, the tests will use the datastore that’s part of your local PUDL workspace.
For example, to run the ETL portion of the integration tests and download fresh input data to a temporary datastore that’s later deleted automatically:
$ pytest --tmp-data test/integration/etl_test.py