podcast_llm.text_to_speech
Text-to-speech conversion module for podcast generation.
This module handles the conversion of text scripts into natural-sounding speech using multiple TTS providers (Google Cloud TTS and ElevenLabs). It includes functionality for:
- Rate limiting API requests to stay within provider quotas 
- Exponential backoff retry logic for API resilience 
- Processing individual conversation lines with appropriate voices 
- Merging multiple audio segments into a complete podcast 
- Managing temporary audio file storage and cleanup 
The module supports different voices for interviewer/interviewee to create natural conversational flow and allows configuration of voice settings and audio effects through the PodcastConfig system.
- Typical usage:
- config = PodcastConfig() convert_to_speech( - config, conversation_script, ‘output.mp3’, ‘.temp_audio/’, ‘mp3’ - ) 
- podcast_llm.text_to_speech.clean_text_for_tts(lines: List) List[source]
- Clean text lines for text-to-speech processing by removing special characters. - Takes a list of dictionaries containing speaker and text information and removes characters that may interfere with text-to-speech synthesis, such as asterisks, underscores, and em dashes. - Parameters:
- lines (List[dict]) – - List of dictionaries with structure: { - ’speaker’: str, # Speaker identifier ‘text’: str # Text to be cleaned - } 
- Returns:
- List of dictionaries with cleaned text and same structure as input 
- Return type:
- List[dict] 
 
- podcast_llm.text_to_speech.combine_consecutive_speaker_chunks(chunks: List[dict]) List[dict][source]
- Combine consecutive chunks from the same speaker into single chunks. - Parameters:
- chunks (List[dict]) – - List of dictionaries containing conversation chunks with structure: { - ’speaker’: str, # Speaker identifier ‘text’: str # Text content - } 
- Returns:
- List of combined chunks where consecutive entries from the same speaker
- are merged into single chunks 
 
- Return type:
- List[dict] 
 
- podcast_llm.text_to_speech.convert_to_speech(config: PodcastConfig, conversation: str, output_file: str, temp_audio_dir: str, audio_format: str) None[source]
- Convert a conversation script to speech audio using Google Text-to-Speech API. - Takes a conversation script consisting of speaker/text pairs and generates audio files for each line using Google’s TTS service. The individual audio files are then merged into a single output file. Uses different voices for different speakers to create a natural conversational feel. - Parameters:
- conversation (str) – - List of dictionaries containing conversation lines with structure: { - ’speaker’: str, # Speaker identifier (‘Interviewer’ or ‘Interviewee’) ‘text’: str # Line content to convert to speech - } 
- output_file (str) – Path where the final merged audio file should be saved 
- temp_audio_dir (str) – Directory path for temporary audio file storage 
- audio_format (str) – Format of the audio files (e.g. ‘mp3’) 
 
- Raises:
- Exception – If any errors occur during TTS conversion or file operations 
 
- podcast_llm.text_to_speech.generate_audio(config: PodcastConfig, final_script: list, output_file: str) str[source]
- Generate audio from a podcast script using text-to-speech. - Takes a final script consisting of speaker/text pairs and generates a single audio file using Google’s Text-to-Speech service. The script is first cleaned and processed to be TTS-friendly, then converted to speech with different voices for different speakers. - Parameters:
- final_script (list) – - List of dictionaries containing script lines with structure: { - ’speaker’: str, # Speaker identifier (‘Interviewer’ or ‘Interviewee’) ‘text’: str # Line content to convert to speech - } 
- output_file (str) – Path where the final audio file should be saved 
 
- Returns:
- Path to the generated audio file 
- Return type:
- str 
- Raises:
- Exception – If any errors occur during TTS conversion or file operations 
 
- podcast_llm.text_to_speech.merge_audio_files(audio_files: List, output_file: str, audio_format: str) None[source]
- Merge multiple audio files into a single output file. - Takes a list of audio files and combines them in the provided order into a single output file. Handles any audio format supported by pydub. - Parameters:
- audio_files (list) – List of paths to audio files to merge 
- output_file (str) – Path where merged audio file should be saved 
- audio_format (str) – Format of input/output audio files (e.g. ‘mp3’, ‘wav’) 
 
- Returns:
- None 
- Raises:
- Exception – If there are any errors during the merging process 
 
- podcast_llm.text_to_speech.process_line_elevenlabs(config: PodcastConfig, text: str, speaker: str)[source]
- Process a line of text into speech using ElevenLabs TTS service. - Takes a line of text and speaker identifier and generates synthesized speech using ElevenLabs’ TTS service. Uses different voices based on the speaker to create natural conversation flow. - Parameters:
- config (PodcastConfig) – Configuration object containing API keys and settings 
- text (str) – The text content to convert to speech 
- speaker (str) – Speaker identifier to determine voice selection 
 
- Returns:
- Raw audio data in bytes format containing the synthesized speech 
- Return type:
- bytes 
 
- podcast_llm.text_to_speech.process_line_google(config: PodcastConfig, text: str, speaker: str)[source]
- Process a single line of text using Google Text-to-Speech API. - Takes a line of text and speaker identifier and generates synthesized speech using Google’s TTS service. Uses different voices based on the speaker to create natural conversation flow. - Parameters:
- text (str) – The text content to convert to speech 
- speaker (str) – Speaker identifier to determine voice selection 
 
- Returns:
- Raw audio data in bytes format containing the synthesized speech 
- Return type:
- bytes 
 
- podcast_llm.text_to_speech.process_lines_google_multispeaker(config: PodcastConfig, chunks: List)[source]
- Process multiple lines of text into speech using Google’s multi-speaker TTS service. - Takes a chunk of conversation lines and generates synthesized speech using Google’s multi-speaker TTS service. Handles up to 6 turns of conversation at once for more natural conversational flow. - Parameters:
- config (PodcastConfig) – Configuration object containing API keys and settings 
- chunks (List) – - List of dictionaries containing conversation lines with structure: { - ’speaker’: str, # Speaker identifier ‘text’: str # Line content to convert to speech - } 
 
- Returns:
- Raw audio data in bytes format containing the synthesized speech 
- Return type:
- bytes