podcast_llm.utils.llm

Utilities for working with Language Learning Models (LLMs) in a standardized way.

This module provides utilities for interacting with various LLM providers through a unified interface. It handles provider-specific implementation details while exposing a consistent API for the rest of the application.

Key components: - LLMWrapper: A class that wraps different LLM providers (OpenAI, Google, Anthropic)

with standardized interfaces for structured output parsing and rate limiting

Helper functions for configuring and instantiating LLM instances with appropriate settings for podcast generation tasks

The module abstracts away provider differences around: - Message formatting and chunking - Rate limiting implementation - Structured output parsing - Temperature and token limit settings

This allows the rest of the application to work with LLMs in a provider-agnostic way while still leveraging provider-specific capabilities when beneficial.

class podcast_llm.utils.llm.LLMWrapper(provider: str, model: str, temperature: float = 1.0, max_tokens: int = 8192, rate_limiter: BaseRateLimiter | None = None)[source]

Bases: Runnable

coerce_to_schema(llm_output: str)[source]

Coerce raw LLM output into a structured schema object.

Takes unstructured text output from the LLM and attempts to parse it into a structured Pydantic object based on the defined schema. Currently supports Question and Answer schema types.

Parameters:

llm_output (str) – Raw text output from the LLM to be coerced

Returns:

Pydantic object matching the defined schema type

Return type:

BaseModel

Raises:

ValueError – If no schema is defined
OutputParserException – If output cannot be coerced to the schema

The coercion maps the raw text to the appropriate schema field: - Question schema -> ‘question’ field - Answer schema -> ‘answer’ field

Invoke the LLM with the given input and configuration.

This method handles provider-specific invocation details while maintaining a consistent interface. It manages structured output formatting and parsing based on the provider’s capabilities.

Parameters:

input (LanguageModelInput) – The input to send to the LLM, typically messages or prompts
config (Optional[RunnableConfig]) – Optional configuration for the invocation
**kwargs (Any) – Additional keyword arguments passed to the underlying LLM

Returns:

The LLM’s response message

Return type:

BaseMessage

The implementation varies by provider: - OpenAI/Anthropic: Direct invocation with native structured output support - Google: Custom handling for structured output via parser and format instructions

with_structured_output(schema: BaseModel)[source]

Configure the LLM wrapper to output structured data using a Pydantic schema.

This method adapts the underlying LLM to output responses conforming to the provided Pydantic model schema. The implementation varies by provider:

OpenAI/Anthropic: Uses native structured output support
Google: Implements structured output via output parser and format instructions

Parameters:: schema (pydantic.BaseModel) – The Pydantic model class defining the expected response structure
Returns:: The wrapper instance configured for structured output
Return type:: LLMWrapper

podcast_llm.utils.llm.get_fast_llm(config: PodcastConfig, rate_limiter: BaseRateLimiter | None = None)[source]

Get a fast LLM model optimized for quick responses.

Creates and returns an LLM wrapper configured with a fast model variant from the specified provider. Fast models trade some quality for improved response speed.

Parameters:

config (PodcastConfig) – Configuration object containing provider settings
rate_limiter (BaseRateLimiter | None, optional) – Rate limiter to control API request frequency. Defaults to None.

Returns:

Wrapper instance configured with a fast model variant

Return type:

LLMWrapper

Raises:

ValueError – If the configured fast_llm_provider is not supported

The function maps providers to their respective fast model variants: - OpenAI: gpt-4o-mini - Google: gemini-1.5-flash - Anthropic: claude-3-5-sonnet

podcast_llm.utils.llm.get_long_context_llm(config: PodcastConfig, rate_limiter: BaseRateLimiter | None = None)[source]

Get a long context LLM model optimized for handling larger prompts.

Creates and returns an LLM wrapper configured with a model variant that can handle longer context windows from the specified provider. These models are optimized for processing larger amounts of text at once.

Parameters:

config (PodcastConfig) – Configuration object containing provider settings
rate_limiter (BaseRateLimiter | None, optional) – Rate limiter to control API request frequency. Defaults to None.

Returns:

Wrapper instance configured with a long context model variant

Return type:

LLMWrapper

Raises:

ValueError – If the configured long_context_llm_provider is not supported

The function maps providers to their respective long context model variants: - OpenAI: gpt-4o - Google: gemini-1.5-pro-latest - Anthropic: claude-3-5-sonnet