pudl.workspace.resource_cache

Implementations of datastore resource caches.

Attributes

Classes

PudlResourceKey

Uniquely identifies a specific resource.

AbstractCache

Defines interaface for the generic resource caching layer.

LocalFileCache

Simple key-value store mapping PudlResourceKeys to ByteIO contents.

GoogleCloudStorageCache

Implements file cache backed by Google Cloud Storage bucket.

LayeredCache

Implements multi-layered system of caches.

Functions

extend_gcp_retry_predicate(predicate, *exception_types)

Extend a GCS predicate function with additional exception_types.

Module Contents

pudl.workspace.resource_cache.logger[source]
pudl.workspace.resource_cache.extend_gcp_retry_predicate(predicate, *exception_types)[source]

Extend a GCS predicate function with additional exception_types.

pudl.workspace.resource_cache.gcs_retry[source]
class pudl.workspace.resource_cache.PudlResourceKey[source]

Bases: NamedTuple

Uniquely identifies a specific resource.

dataset: str[source]
doi: str[source]
name: str[source]
__repr__() str[source]

Returns string representation of PudlResourceKey.

get_local_path() pathlib.Path[source]

Returns (relative) path that should be used when caching this resource.

class pudl.workspace.resource_cache.AbstractCache(read_only: bool = False)[source]

Bases: abc.ABC

Defines interaface for the generic resource caching layer.

_read_only[source]
is_read_only() bool[source]

Returns true if the cache is read-only and should not be modified.

abstract get(resource: PudlResourceKey) bytes[source]

Retrieves content of given resource or throws KeyError.

abstract add(resource: PudlResourceKey, content: bytes) None[source]

Adds resource to the cache and sets the content.

abstract delete(resource: PudlResourceKey) None[source]

Removes the resource from cache.

abstract contains(resource: PudlResourceKey) bool[source]

Returns True if the resource is present in the cache.

class pudl.workspace.resource_cache.LocalFileCache(cache_root_dir: pathlib.Path, **kwargs: Any)[source]

Bases: AbstractCache

Simple key-value store mapping PudlResourceKeys to ByteIO contents.

cache_root_dir[source]
_resource_path(resource: PudlResourceKey) pathlib.Path[source]
get(resource: PudlResourceKey) bytes[source]

Retrieves value associated with a given resource.

add(resource: PudlResourceKey, content: bytes)[source]

Adds (or updates) resource to the cache with given value.

delete(resource: PudlResourceKey)[source]

Deletes resource from the cache.

contains(resource: PudlResourceKey) bool[source]

Returns True if resource is present in the cache.

class pudl.workspace.resource_cache.GoogleCloudStorageCache(gcs_path: str, **kwargs: Any)[source]

Bases: AbstractCache

Implements file cache backed by Google Cloud Storage bucket.

_path_prefix[source]
_bucket[source]
_blob(resource: PudlResourceKey) google.cloud.storage.blob.Blob[source]

Retrieve Blob object associated with given resource.

get(resource: PudlResourceKey) bytes[source]

Retrieves value associated with given resource.

add(resource: PudlResourceKey, value: bytes)[source]

Adds (or updates) resource to the cache with given value.

delete(resource: PudlResourceKey)[source]

Deletes resource from the cache.

contains(resource: PudlResourceKey) bool[source]

Returns True if resource is present in the cache.

class pudl.workspace.resource_cache.LayeredCache(*caches: list[AbstractCache], **kwargs: Any)[source]

Bases: AbstractCache

Implements multi-layered system of caches.

This allows building multi-layered system of caches. The idea is that you can have faster local caches with fall-back to the more remote or expensive caches that can be acessed in case of missing content.

Only the closest layer is being written to (set, delete), while all remaining layers are read-only (get).

_caches: list[AbstractCache][source]
add_cache_layer(cache: AbstractCache)[source]

Adds caching layer.

The priority is below all other.

num_layers()[source]

Returns number of caching layers that are in this LayeredCache.

get(resource: PudlResourceKey) bytes[source]

Returns content of a given resource.

add(resource: PudlResourceKey, value)[source]

Adds (or replaces) resource into the cache with given value.

delete(resource: PudlResourceKey)[source]

Removes resource from the cache if the cache is not in the read_only mode.

contains(resource: PudlResourceKey) bool[source]

Returns True if resource is present in the cache.

is_optimally_cached(resource: PudlResourceKey) bool[source]

Return True if resource is contained in the closest write-enabled layer.