podcast_llm.utils.checkpointer

Utilities for checkpointing and resuming long-running processes.

This module provides functionality for saving and loading intermediate computation results to disk, enabling efficient resumption of processing from the last successful checkpoint. This is particularly useful for long-running podcast generation tasks that may need to be interrupted and resumed.

Key components: - Checkpointer: A class that manages saving/loading of checkpoint data with configurable

paths and serialization

  • to_snake_case: Helper function for converting checkpoint names to valid filenames

The checkpointing system helps with: - Saving intermediate results during multi-step processing - Resuming interrupted processes without recomputing completed steps - Debugging by examining saved checkpoint states - Reducing wasted computation on process restarts

The module uses pickle for serialization by default but is designed to be extensible to other serialization formats as needed.

class podcast_llm.utils.checkpointer.Checkpointer(checkpoint_key: str, checkpoint_dir: str = '.checkpoints', enabled: bool = True)[source]

Bases: object

A class for managing checkpointing of intermediate results during processing.

The Checkpointer allows saving and loading of intermediate computation results to disk, enabling resumption of long-running processes from the last successful checkpoint.

Key features: - Configurable checkpoint directory and key prefix for files - Can be enabled/disabled via constructor - Automatically creates checkpoint directory if needed - Saves results as pickle files with stage-specific names - Loads from existing checkpoints when available

Example usage:
checkpointer = Checkpointer(

checkpoint_key=’my_process_’, enabled=True

)

# Will save result to disk and return it result = checkpointer.checkpoint(

expensive_computation(), stage_name=’stage1’

)

# On subsequent runs, will load from disk instead of recomputing result = checkpointer.checkpoint(

expensive_computation(), stage_name=’stage1’

)

checkpoint(fn: Callable, args: list, stage_name: str = 'result') Any[source]
podcast_llm.utils.checkpointer.to_snake_case(text: str) str[source]

Convert a string to snake_case format.

Takes any string input and converts it to snake_case by: 1. Replacing spaces and hyphens with underscores 2. Converting to lowercase 3. Removing any non-alphanumeric characters except underscores

Parameters:

text (str) – Input string to convert

Returns:

Snake case formatted string

Return type:

str