dbt_helper ========== .. py:module:: dbt_helper .. autoapi-nested-parse:: A basic CLI to autogenerate dbt data test configurations. Attributes ---------- .. autoapisummary:: dbt_helper.logger dbt_helper.ALL_TABLES Classes ------- .. autoapisummary:: dbt_helper.DbtColumn dbt_helper.DbtTable dbt_helper.DbtSource dbt_helper.DbtSchema dbt_helper.UpdateResult dbt_helper.TableUpdateArgs Functions --------- .. autoapisummary:: dbt_helper.schema_has_removals_or_modifications dbt_helper._log_schema_diff dbt_helper._schema_diff_summary dbt_helper.get_data_source dbt_helper._get_local_table_path dbt_helper._get_model_path dbt_helper._get_row_count_csv_path dbt_helper._get_existing_row_counts dbt_helper._calculate_row_counts dbt_helper._combine_row_counts dbt_helper._write_row_counts dbt_helper.update_row_counts dbt_helper.update_table_schema dbt_helper._log_update_result dbt_helper._infer_partition_column dbt_helper.update_tables dbt_helper.dbt_helper Module Contents --------------- .. py:data:: logger .. py:data:: ALL_TABLES .. py:class:: DbtColumn(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define yaml structure of a dbt column. .. py:attribute:: name :type: str .. py:attribute:: description :type: str | None :value: None .. py:attribute:: data_tests :type: list | None :value: None .. py:attribute:: meta :type: dict | None :value: None .. py:attribute:: tags :type: list[str] | None :value: None .. py:method:: add_column_tests(column_tests: list) -> DbtColumn Add data tests to columns in dbt config. .. py:class:: DbtTable(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define yaml structure of a dbt table. .. py:attribute:: name :type: str .. py:attribute:: description :type: str | None :value: None .. py:attribute:: data_tests :type: list | None :value: None .. py:attribute:: columns :type: list[DbtColumn] .. py:attribute:: meta :type: dict | None :value: None .. py:attribute:: tags :type: list[str] | None :value: None .. py:attribute:: config :type: dict | None :value: None .. py:method:: add_source_tests(source_tests: list) -> DbtSource Add data tests to source in dbt config. .. py:method:: add_column_tests(column_tests: dict[str, list]) -> DbtSource Add data tests to columns in dbt config. .. py:method:: get_row_count_test_dict(table_name: str, partition_column: str) :staticmethod: Return a dictionary with a dbt row count data test encoded in a dict. .. py:method:: from_table_name(table_name: str, partition_column: str) -> DbtSchema :classmethod: Construct configuration defining table from PUDL metadata. .. py:class:: DbtSource(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define basic dbt yml structure to add a pudl table as a dbt source. .. py:attribute:: name :type: str :value: 'pudl' .. py:attribute:: tables :type: list[DbtTable] .. py:attribute:: data_tests :type: list | None :value: None .. py:attribute:: description :type: str | None :value: None .. py:attribute:: meta :type: dict | None :value: None .. py:method:: add_source_tests(source_tests: list) -> DbtSource Add data tests to source in dbt config. .. py:method:: add_column_tests(column_tests: dict[list]) -> DbtSource Add data tests to columns in dbt config. .. py:class:: DbtSchema(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define basic structure of a dbt models yaml file. .. py:attribute:: version :type: int :value: 2 .. py:attribute:: sources :type: list[DbtSource] .. py:attribute:: models :type: list[DbtTable] | None :value: None .. py:method:: add_source_tests(source_tests: list, model_name: str | None = None) -> DbtSchema Add data tests to source in dbt config. .. py:method:: add_column_tests(column_tests: dict[list], model_name: str | None = None) -> DbtSchema Add data tests to columns in dbt config. .. py:method:: from_table_name(table_name: str, partition_column: str) -> DbtSchema :classmethod: Construct configuration defining table from PUDL metadata. .. py:method:: from_yaml(schema_path: pathlib.Path) -> DbtSchema :classmethod: Load a DbtSchema object from a YAML file. .. py:method:: to_yaml(schema_path: pathlib.Path) :classmethod: Write DbtSchema object to YAML file. .. py:function:: schema_has_removals_or_modifications(diff: deepdiff.DeepDiff) -> bool Check if the DeepDiff includes any removals or modifications. .. py:function:: _log_schema_diff(diff: deepdiff.DeepDiff, old_schema: DbtSchema, new_schema: DbtSchema) Print old and new YAML, and summary of schema changes. .. py:function:: _schema_diff_summary(diff: deepdiff.DeepDiff) -> str Return all changes in a DeepDiff between two schemas as a string. .. py:function:: get_data_source(table_name: str) -> str Return data source for a table or 'output' if there's more than one source. .. py:class:: UpdateResult Bases: :py:obj:`tuple` .. py:attribute:: success .. py:attribute:: message .. py:function:: _get_local_table_path(table_name) .. py:function:: _get_model_path(table_name: str, data_source: str) -> pathlib.Path .. py:function:: _get_row_count_csv_path(target: str = 'etl-full') -> pathlib.Path .. py:function:: _get_existing_row_counts(target: str = 'etl-full') -> pandas.DataFrame .. py:function:: _calculate_row_counts(table_name: str, partition_column: str = 'report_year') -> pandas.DataFrame .. py:function:: _combine_row_counts(existing: pandas.DataFrame, new: pandas.DataFrame) -> pandas.DataFrame .. py:function:: _write_row_counts(row_counts: pandas.DataFrame, target: str = 'etl-full') .. py:function:: update_row_counts(table_name: str, partition_column: str = 'report_year', target: str = 'etl-full', clobber: bool = False, update: bool = False) -> UpdateResult Generate updated row counts per partition and write to csv file within dbt project. .. py:function:: update_table_schema(table_name: str, data_source: str, partition_column: str = 'report_year', clobber: bool = False, update: bool = False) -> UpdateResult Generate and write out a schema.yaml file defining a new or updated table. .. py:function:: _log_update_result(result: UpdateResult) .. py:function:: _infer_partition_column(table_name: str) -> str .. py:class:: TableUpdateArgs Define a single class to collect the args for all table update commands. .. py:attribute:: tables :type: list[str] .. py:attribute:: target :type: Literal['etl-full', 'etl-fast'] :value: 'etl-full' .. py:attribute:: schema :type: bool :value: False .. py:attribute:: row_counts :type: bool :value: False .. py:attribute:: clobber :type: bool :value: False .. py:attribute:: update :type: bool :value: False .. py:function:: update_tables(tables: list[str], target: str, clobber: bool, update: bool, schema: bool, row_counts: bool) Add or update dbt schema configs and row count expectations for PUDL tables. The ``tables`` argument can be a single table name, a list of table names, or 'all'. If 'all' the script will update configurations for for all PUDL tables. If ``--clobber`` is set, existing configurations for tables will be overwritten. If ``--update`` is set, existing configurations for tables will be updated only if this does not result in deletions. .. py:function:: dbt_helper() Script for auto-generating dbt configuration and migrating existing tests. This CLI currently provides one sub-command: ``update-tables`` which can update or create a dbt table (model) schema.yml file under the ``dbt/models`` repo. These configuration files tell dbt about the structure of the table and what data tests are specified for it. It also adds a (required) row count test by default. The script can also generate or update the expected row counts for existing tables, assuming they have been materialized to parquet files and are sitting in your $PUDL_OUT directory. Run ``dbt_helper {command} --help`` for detailed usage on each command.