podcast_llm.extractors.audio
Audio file extraction module.
This module provides functionality for extracting text content from audio files by transcribing them using OpenAI’s Whisper model. It handles various audio formats (mp3, wav, m4a, ogg) and manages splitting long audio files into chunks that stay within API limits.
The module includes: - AudioSourceDocument class for handling audio file extraction - Audio file splitting based on silence detection - Transcription using OpenAI Whisper API - Temporary file management for processing
Example
>>> from podcast_llm.extractors.audio import AudioSourceDocument
>>> extractor = AudioSourceDocument('podcast.mp3')
>>> extractor.extract()
>>> print(extractor.content)
'Transcribed text from audio file...'
The extraction process: 1. Loads the audio file using pydub 2. Splits into ~10 minute segments based on silence detection 3. Saves segments to temporary files 4. Transcribes each segment using Whisper 5. Combines transcriptions into final content
The module handles errors gracefully and cleans up temporary files after processing.
- class podcast_llm.extractors.audio.AudioSourceDocument(source: str)[source]
Bases:
BaseSourceDocument
A document extractor for audio files.
This class handles extracting text content from audio files (mp3, wav, m4a, ogg) by splitting them into manageable segments and transcribing them using OpenAI’s Whisper model. The audio is first split into 10-minute chunks to stay within API limits.
- src
Path to the source audio file
- Type:
str
- src_type
Type of source document (‘Audio File’)
- Type:
str
- title
Title combining source type and filename
- Type:
str
- content
Extracted text content after transcription
- Type:
Optional[str]
Example
>>> extractor = AudioSourceDocument('podcast.mp3') >>> extractor.extract() >>> print(extractor.content) 'Transcribed text from audio file...'
- extract() str [source]
Extract text content from an audio file using OpenAI’s Whisper API.
This method takes an audio file, splits it into 10-minute segments to comply with API limits, and transcribes each segment using OpenAI’s Whisper speech-to-text model. The transcribed segments are then combined into a single text document.
- Returns:
The complete transcribed text from the audio file
- Return type:
str
- Raises:
openai.OpenAIError – If there is an error calling the Whisper API
IOError – If there is an error reading the audio file
Example
>>> extractor = AudioSourceDocument('podcast.mp3') >>> text = extractor.extract() >>> print(text[:100]) 'Welcome to today's episode where we'll be discussing...'