podcast_llm.extractors.audio

Audio file extraction module.

This module provides functionality for extracting text content from audio files by transcribing them using OpenAI’s Whisper model. It handles various audio formats (mp3, wav, m4a, ogg) and manages splitting long audio files into chunks that stay within API limits.

The module includes: - AudioSourceDocument class for handling audio file extraction - Audio file splitting based on silence detection - Transcription using OpenAI Whisper API - Temporary file management for processing

Example

>>> from podcast_llm.extractors.audio import AudioSourceDocument
>>> extractor = AudioSourceDocument('podcast.mp3')
>>> extractor.extract()
>>> print(extractor.content)
'Transcribed text from audio file...'

The extraction process: 1. Loads the audio file using pydub 2. Splits into ~10 minute segments based on silence detection 3. Saves segments to temporary files 4. Transcribes each segment using Whisper 5. Combines transcriptions into final content

The module handles errors gracefully and cleans up temporary files after processing.

class podcast_llm.extractors.audio.AudioSourceDocument(source: str)[source]

Bases: BaseSourceDocument

A document extractor for audio files.

This class handles extracting text content from audio files (mp3, wav, m4a, ogg) by splitting them into manageable segments and transcribing them using OpenAI’s Whisper model. The audio is first split into 10-minute chunks to stay within API limits.

src

Path to the source audio file

Type:: str

src_type

Type of source document (‘Audio File’)

Type:: str

title

Title combining source type and filename

Type:: str

content

Extracted text content after transcription

Type:: Optional[str]

Example

>>> extractor = AudioSourceDocument('podcast.mp3')
>>> extractor.extract()
>>> print(extractor.content)
'Transcribed text from audio file...'

extract() → str[source]

Extract text content from an audio file using OpenAI’s Whisper API.

This method takes an audio file, splits it into 10-minute segments to comply with API limits, and transcribes each segment using OpenAI’s Whisper speech-to-text model. The transcribed segments are then combined into a single text document.

Returns:

The complete transcribed text from the audio file

Return type:

str

Raises:

openai.OpenAIError – If there is an error calling the Whisper API
IOError – If there is an error reading the audio file

Example

>>> extractor = AudioSourceDocument('podcast.mp3')
>>> text = extractor.extract()
>>> print(text[:100])
'Welcome to today's episode where we'll be discussing...'