podcast_llm.text_to_speech
Text-to-speech conversion module for podcast generation.
This module handles the conversion of text scripts into natural-sounding speech using multiple TTS providers (Google Cloud TTS and ElevenLabs). It includes functionality for:
Rate limiting API requests to stay within provider quotas
Exponential backoff retry logic for API resilience
Processing individual conversation lines with appropriate voices
Merging multiple audio segments into a complete podcast
Managing temporary audio file storage and cleanup
The module supports different voices for interviewer/interviewee to create natural conversational flow and allows configuration of voice settings and audio effects through the PodcastConfig system.
- Typical usage:
config = PodcastConfig() convert_to_speech(
config, conversation_script, ‘output.mp3’, ‘.temp_audio/’, ‘mp3’
)
- podcast_llm.text_to_speech.clean_text_for_tts(lines: List) List [source]
Clean text lines for text-to-speech processing by removing special characters.
Takes a list of dictionaries containing speaker and text information and removes characters that may interfere with text-to-speech synthesis, such as asterisks, underscores, and em dashes.
- Parameters:
lines (List[dict]) –
List of dictionaries with structure: {
’speaker’: str, # Speaker identifier ‘text’: str # Text to be cleaned
}
- Returns:
List of dictionaries with cleaned text and same structure as input
- Return type:
List[dict]
- podcast_llm.text_to_speech.combine_consecutive_speaker_chunks(chunks: List[dict]) List[dict] [source]
Combine consecutive chunks from the same speaker into single chunks.
- Parameters:
chunks (List[dict]) –
List of dictionaries containing conversation chunks with structure: {
’speaker’: str, # Speaker identifier ‘text’: str # Text content
}
- Returns:
- List of combined chunks where consecutive entries from the same speaker
are merged into single chunks
- Return type:
List[dict]
- podcast_llm.text_to_speech.convert_to_speech(config: PodcastConfig, conversation: str, output_file: str, temp_audio_dir: str, audio_format: str) None [source]
Convert a conversation script to speech audio using Google Text-to-Speech API.
Takes a conversation script consisting of speaker/text pairs and generates audio files for each line using Google’s TTS service. The individual audio files are then merged into a single output file. Uses different voices for different speakers to create a natural conversational feel.
- Parameters:
conversation (str) –
List of dictionaries containing conversation lines with structure: {
’speaker’: str, # Speaker identifier (‘Interviewer’ or ‘Interviewee’) ‘text’: str # Line content to convert to speech
}
output_file (str) – Path where the final merged audio file should be saved
temp_audio_dir (str) – Directory path for temporary audio file storage
audio_format (str) – Format of the audio files (e.g. ‘mp3’)
- Raises:
Exception – If any errors occur during TTS conversion or file operations
- podcast_llm.text_to_speech.generate_audio(config: PodcastConfig, final_script: list, output_file: str) str [source]
Generate audio from a podcast script using text-to-speech.
Takes a final script consisting of speaker/text pairs and generates a single audio file using Google’s Text-to-Speech service. The script is first cleaned and processed to be TTS-friendly, then converted to speech with different voices for different speakers.
- Parameters:
final_script (list) –
List of dictionaries containing script lines with structure: {
’speaker’: str, # Speaker identifier (‘Interviewer’ or ‘Interviewee’) ‘text’: str # Line content to convert to speech
}
output_file (str) – Path where the final audio file should be saved
- Returns:
Path to the generated audio file
- Return type:
str
- Raises:
Exception – If any errors occur during TTS conversion or file operations
- podcast_llm.text_to_speech.merge_audio_files(audio_files: List, output_file: str, audio_format: str) None [source]
Merge multiple audio files into a single output file.
Takes a list of audio files and combines them in the provided order into a single output file. Handles any audio format supported by pydub.
- Parameters:
audio_files (list) – List of paths to audio files to merge
output_file (str) – Path where merged audio file should be saved
audio_format (str) – Format of input/output audio files (e.g. ‘mp3’, ‘wav’)
- Returns:
None
- Raises:
Exception – If there are any errors during the merging process
- podcast_llm.text_to_speech.process_line_elevenlabs(config: PodcastConfig, text: str, speaker: str)[source]
Process a line of text into speech using ElevenLabs TTS service.
Takes a line of text and speaker identifier and generates synthesized speech using ElevenLabs’ TTS service. Uses different voices based on the speaker to create natural conversation flow.
- Parameters:
config (PodcastConfig) – Configuration object containing API keys and settings
text (str) – The text content to convert to speech
speaker (str) – Speaker identifier to determine voice selection
- Returns:
Raw audio data in bytes format containing the synthesized speech
- Return type:
bytes
- podcast_llm.text_to_speech.process_line_google(config: PodcastConfig, text: str, speaker: str)[source]
Process a single line of text using Google Text-to-Speech API.
Takes a line of text and speaker identifier and generates synthesized speech using Google’s TTS service. Uses different voices based on the speaker to create natural conversation flow.
- Parameters:
text (str) – The text content to convert to speech
speaker (str) – Speaker identifier to determine voice selection
- Returns:
Raw audio data in bytes format containing the synthesized speech
- Return type:
bytes
- podcast_llm.text_to_speech.process_lines_google_multispeaker(config: PodcastConfig, chunks: List)[source]
Process multiple lines of text into speech using Google’s multi-speaker TTS service.
Takes a chunk of conversation lines and generates synthesized speech using Google’s multi-speaker TTS service. Handles up to 6 turns of conversation at once for more natural conversational flow.
- Parameters:
config (PodcastConfig) – Configuration object containing API keys and settings
chunks (List) –
List of dictionaries containing conversation lines with structure: {
’speaker’: str, # Speaker identifier ‘text’: str # Line content to convert to speech
}
- Returns:
Raw audio data in bytes format containing the synthesized speech
- Return type:
bytes