ReferenceUpdated 2026-04-25·1 min read

Video Generation Pipeline Overview

What runs between paste-URL and ready-to-publish: Strategy, Voice + Subtitles, and Edge Render.

Every Clipus video runs through three phases. You can watch each one progress in the dashboard.

The three phases

  • Phase 1 — Strategy + Scripttarget ≤ 45s p95

    DOM analysis, marketing strategy, script + evaluator.

  • Phase 2 — Voice + Subtitles + Blueprinttarget ≤ 20s p95

    ElevenLabs VO, Whisper subtitles, render blueprint.

  • Edge — Client Rendertarget ≤ 25s p95

    WebCodecs client-side render in browser.

Total target: ≤ 90s p95.

Phase 1 — Strategy + Script

A planner agent reads your DOM and proposes three marketing strategies. An evaluator agent scores them against a rubric (hook strength, specificity, pacing, CTA clarity). The winning script enters Phase 2.

Phase 2 — Voice + Subtitles + Blueprint

ElevenLabs generates the voiceover. Whisper transcribes it back to timed subtitles. A blueprint compiler emits the render plan (scene list, durations, transitions).

Phase Edge — Client Render

Your browser renders the final video using WebCodecs (HW-accelerated). No server-side FFmpeg in the critical path. The result lands in your dashboard ready to publish.

What can slow it down

  • Heavy SaaS pages (DataDog, HubSpot) take longer in Phase 1 because the DOM is larger.
  • ElevenLabs queue spikes can add 5-15 seconds to Phase 2.
  • Older browsers fall back to the FFmpeg server worker (slower, but still completes).
Still need help? Contact us.
Was this helpful?