pudl.extract.csv#

Extractor for CSV data.

Module Contents#

Classes#

CsvExtractor

Generalized class for extracting dataframes from CSV files.

Functions#

open_csv_resource(→ csv.DictReader)

Open the given resource file as csv.DictReader.

get_table_file_map(→ dict[str, str])

Return a dictionary of table names and filenames for the dataset.

Attributes#

pudl.extract.csv.logger[source]#
pudl.extract.csv.open_csv_resource(dataset: str, base_filename: str) csv.DictReader[source]#

Open the given resource file as csv.DictReader.

Parameters:
  • dataset – used to load metadata from package_data/{dataset} subdirectory.

  • base_filename – the name of the file in the subdirectory to open.

pudl.extract.csv.get_table_file_map(dataset: str) dict[str, str][source]#

Return a dictionary of table names and filenames for the dataset.

Parameters:

dataset – used to load metadata from package_data/{dataset} subdirectory.

class pudl.extract.csv.CsvExtractor(zipfile: zipfile.ZipFile, table_file_map: dict[str, str])[source]#

Generalized class for extracting dataframes from CSV files.

The extraction logic is invoked by calling extract() method of this class.

get_table_names() list[str][source]#

Returns list of tables that this extractor provides access to.

extract_one(table_name: str) pandas.DataFrame[source]#

Read the data from the CSV source file and return as a dataframe.

extract_all() dict[str, pandas.DataFrame][source]#

Extracts a dictionary of table names and dataframes from CSV source files.