How It Works
AI-generated podcast episodes, from research to your ears
Overview
tell-tale is a fully automated podcast factory. Every week it scans the internet for noteworthy topics, proposes episode ideas for human review, then generates a polished two-host conversation complete with audio, show notes, and branded bumpers—all published straight to your podcast app.
Research Propose Generate Publish
+----------+ +----------+ +----------+ +----------+
| Weekly | | Episode | | Script + | | RSS Feed |
| scanning |-->| approval |-->| Audio |-->| + Site |
+----------+ +----------+ +----------+ +----------+
AI Human AI AutoA single deployment hosts multiple independent shows, each with its own personality, voice pair, artwork, and feed. Episodes are published as standard RSS—compatible with Spotify, Apple Podcasts, and any podcast app.
System Design
The pipeline is organised into discrete stages, each with a clear responsibility. No stage relies on knowledge of the technology behind another—making it easy to swap components as better tools emerge.
Research & Discovery
A scheduled scan reviews recent news, papers, incidents, and announcements relevant to each show’s domain. It distils the findings into 5–7 episode proposals, complete with source links and a short pitch for each. Every suggested source URL is validated automatically; proposals with missing or placeholder links are flagged so editors know which suggestions need manual sourcing. The search engine’s own grounding metadata is extracted and attached so editors can trace every claim back to its origin.
Editorial Control
Proposals surface for human review. Editors can approve a topic as-is, refine the angle with additional guidance that triggers a fresh re-research pass, or reject it entirely. Refinement is iterative—each round incorporates editor feedback and produces a revised proposal with new source material. Approvals can also carry inline adjustments that get baked directly into the generation prompt. Only designated maintainers can trigger generation; non-maintainers are politely redirected. An in-progress episode can be cancelled at any time, and concurrency guards prevent duplicate approvals on the same issue. Nothing is generated without explicit human approval.
Transcript Generation
Approved source material is transformed into a natural two-host conversation—a curious Host and a knowledgeable Expert. Each show carries its own persona that shapes tone, vocabulary, and structure. Transcripts follow a three-act arc: an attention-grabbing teaser, a deep-dive across several themes, and a wrap-up with key takeaways.
Transcript Validation
Before any audio is rendered, the transcript passes through multiple automated quality gates. A sliding-window scan catches verbatim-repeated dialogue blocks—a known failure mode of large language models. Sentence-level duplicate scoring flags transcripts with excessive repetition. Pattern matching detects structural artifacts (prompt labels or segment headings leaking into spoken dialogue), unnatural role-name addressing, and style drift such as self-referential first-person language or stale hype cliches. Finally, an AI review council evaluates the transcript for duplicated content, fabricated claims against the original source material, structural problems, prompt leakage, and tone drift. The council can correct issues and re-evaluate for up to two rounds before the best available version proceeds.
Audio Rendering & Verification
The transcript is rendered into multi-speaker audio with distinct voices for Host and Expert. Long episodes are automatically chunked at speaker-turn boundaries and reassembled into a seamless recording. Each chunk is checked for runaway synthesis (a known text-to-speech loop failure); oversized chunks are automatically split and re-rendered. After synthesis, the audio is transcribed back to text and the word count is compared against the original—catching truncation, looping, and corruption. If the fidelity check fails, the system automatically retries with smaller chunks before proceeding. Keyword coverage analysis provides a second layer of verification, ensuring key terms from the script survive the audio round-trip.
Bumper Assembly
Branded intro, transition sting, and outro clips are spliced around the conversation—giving every episode a polished, consistent feel unique to its show. Asset existence is verified before assembly begins.
Publishing & Safety
The finished audio and episode metadata are published to standard podcast feeds. A strict state machine governs episode progression—no stage can be skipped, and terminal-state guards prevent zombie retries from re-processing completed work. Per-show and aggregate feeds are regenerated automatically, and a companion website updates in sync with episode pages, show notes, and an embedded player.
Deployment & Canary Testing
Every deployment is followed by an automated end-to-end canary test: a deterministic source prompt is ingested, the full pipeline runs, and the output is validated for status, audio presence, duration, and transcript existence. If the canary fails, a tracking issue is created automatically. Health gates verify all deployed services are responsive before declaring success.
Multi-Show Architecture
A single deployment powers many independent shows. Each show has its own editorial voice, speaker pair, artwork, bumper assets, and dedicated feed. New shows pass through multiple human approval gates—persona, logo, intro and outro scripts, and all audio assets must be signed off before the show goes live. Episodes can be moved between shows or regenerated with updated configuration at any time.
Technical Details
Under the hood, tell-tale combines GitHub-native workflows with Google Cloud infrastructure and Gemini-family models.
Workflow Automation
GitHub Issues serve as the editorial ledger—every topic proposal, approval decision, and generation request is tracked as an issue with show-specific labels. GitHub Actions orchestrates the end-to-end lifecycle: scheduled weekly research, slash-command triggers (/research, /approve, /refine, /cancel), label-based episode generation, and automatic issue closure once an episode is published. Only users listed in MAINTAINERS.md can trigger generation. Issue-level concurrency groups prevent duplicate approvals, and a scheduled sweep closes issues for completed episodes every four hours.
Research
Gemini 2.5 Pro with Google Search grounding scans the past two weeks for articles, academic papers, incidents, and announcements. Each show runs its own research pass on a staggered Monday schedule. Results are filed as GitHub Issues tagged per show. All suggested URLs are validated automatically; missing or placeholder links trigger a visible warning banner. Grounding metadata URLs from the search engine are extracted and appended to each issue so editors can trace every claim to its source. The research prompt includes explicit anti-fabrication guardrails requiring real, recently published sources.
Transcript Generation
Gemini 2.5 Flash (1 M token context window) transforms source material into a two-host podcast transcript. Each show carries a persona prompt that shapes tone—energetic and skeptical for Tech Disruptions, measured and evidence-focused for Paper Trail, pragmatic and opinionated for Debug Log. Transcripts target 2,800–3,400 words (~15–18 min) following a three-act structure. The system prompt forbids fabricated facts plus first-person/source-ownership language and stale hype cliches.
Transcript Validation
Seven automated checks run before audio rendering:
- Duplicate block detection — a 4-line sliding window catches verbatim-repeated dialogue, a known Gemini failure mode.
- Sentence-level duplicate scoring — flags transcripts with >10% duplicated sentences (20+ chars).
- Structural artifact detection — regex scan catches prompt labels like “Segment 1” or “Opening Teaser” leaking into spoken dialogue.
- Role-name addressing — detects speakers unnaturally calling each other “host” or “expert” in conversation.
- Style drift detection — flags first-person self-reference such as “I”, “we”, “our”, or “let's”, plus stale hype phrases like “paradigm shift” or “game changer.”
- AI review council — Gemini 2.5 Flash reviews the full transcript for duplicates, fabricated claims vs. the original source material, structural problems, style drift, and prompt leakage. Up to 2 correction rounds run automatically.
- Fact-checking — the validation prompt includes the first 4,000 characters of source material so the LLM can flag unsupported or fabricated claims.
A second Gemini pass generates a 2–3 sentence summary for the RSS feed, plus structured show notes with source links and a glossary of technical terms.
Audio Generation & Verification
Gemini 2.5 Flash Preview TTS renders the transcript into 24 kHz, 16-bit mono WAV audio with distinct voices per speaker. Shows choose from available voices (Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr). Long transcripts are split into 8,000-character chunks at speaker-turn boundaries; raw PCM frames are concatenated and re-wrapped into a single WAV file.
After synthesis, three verification layers run:
- TTS loop detection — any chunk exceeding 300 s is flagged as a likely synthesis loop and auto-split in half for re-rendering.
- Round-trip fidelity — the WAV is transcribed back to text via Gemini and the word-count ratio compared (acceptable range: 0.70–1.15). On failure the system retries with smaller 2,000-character chunks.
- Keyword coverage — meaningful keywords (4+ chars, non-stopwords) are extracted from the original transcript and checked against the round-trip transcription (50% coverage threshold).
Bumpers & Assembly
Per-show intro, tech-sting, and outro WAV files are spliced in order: intro → sting → body → outro. Asset existence is verified before assembly. Bumper audio is generated once per show using TTS with style prompts and stored as reusable assets.
Logo Generation
Show logos are AI-generated using Imagen via Vertex AI and stored as show assets. Each show can regenerate its artwork at any time with an updated prompt.
Deployment & Canary Testing
Every push to main triggers a parallel deploy of all three Cloud Functions followed by a health gate that verifies each service is responsive. A post-deploy canary test runs the full pipeline end-to-end: a deterministic source prompt is ingested, polled to completion, and the output is validated for job status, audio URI, duration (60–600 s), and transcript existence. On canary failure, a GitHub issue is created automatically with the canary-failure label and debugging instructions.
Infrastructure
- Compute — Cloud Functions (2nd gen) for event-driven orchestration; Cloud Run + FastAPI for feed serving.
- Storage — Dual GCS buckets for ingest and published assets.
- Database — Firestore (Native mode) tracks job state:
QUEUED → PROCESSING → GENERATING_TRANSCRIPT → VALIDATING_TRANSCRIPT → GENERATING_AUDIO → VALIDATING_AUDIO → PUBLISHED. Terminal-state guards prevent zombie retries. - Queue — Cloud Tasks with exponential backoff for reliable retries; serialised concurrency on the publish queue prevents feed race conditions.
- Triggers — Eventarc fires on GCS object uploads; Cloud Tasks dispatches generation and publishing with serialised concurrency for feed safety.
Observability
OpenTelemetry spans instrument every pipeline stage with duration metrics, Gemini token usage, and estimated cost. Key counters include audio verification retries and failures, job completions and failures, and per-stage durations. A quarterly pricing staleness check auto-opens an issue if model pricing data falls more than 90 days out of date.
Feeds
RSS 2.0 with iTunes podcast extensions. Per-show feeds (feed-{show}.xml) plus an aggregate feed.xml. Audio served via time-limited signed URLs. Compatible with Spotify, Apple Podcasts, Pocket Casts, Overcast, and any standard podcast app.