pudl.output.export module

Routines for exporting data from PUDL for use elsewhere.

Function names should be indicative of the format of the thing that’s being exported (e.g. CSV, Excel spreadsheets, parquet files, HDF5).

pudl.output.export.annotated_xlsx(df, notes_dict, tags_dict, first_cols, sheet_name, xlsx_writer)[source]

Outputs an annotated spreadsheet workbook based on compiled dataframes.

Creates annotation tab and header rows for EIA 860, EIA 923, and FERC 1 fields in a dataframe. This is done using an Excel Writer object, which must be created and saved outside the function, thereby allowing multiple sheets and associated annotations to be compiled in the same Excel file.

Parameters
  • df (pandas.DataFrame) – The dataframe for which annotations are being created

  • notes_dict (dict) – dictionary with column names as keys and long annotations as values

  • tags_dict (dict) – dictionary of dictionaries with tag categories as keys for outer dictionary and values are dictionaries with column names as keys and values are tag within the tag category

  • first_cols (list) – ordered list of columns that should come first in outfile

  • sheet_name (string) – name of data sheet in output spreadsheet

  • xlsx_writer (pandas.ExcelWriter) – this is an ExcelWriter object used to accumulate multiple tabs, which must be created outside of function, before calling the first time e.g. “xlsx_writer = pd.ExcelWriter(‘outfile.xlsx’)”

Returns

which must be called outside the function, after final use of function, for writing out to excel: “xlsx_writer.save()”

Return type

xlsx_writer (pandas.ExcelWriter)