Video Generation Pipeline Overview — Clipus Help

Every Clipus video runs through three phases. You can watch each one progress in the dashboard.

The three phases

Phase 1 — Strategy + Scripttarget ≤ 45s p95
DOM analysis, marketing strategy, script + evaluator.
Phase 2 — Voice + Subtitles + Blueprinttarget ≤ 20s p95
ElevenLabs VO, Whisper subtitles, render blueprint.
Edge — Client Rendertarget ≤ 25s p95
WebCodecs client-side render in browser.

Total target: ≤ 90s p95.

Phase 1 — Strategy + Script

A planner agent reads your DOM and proposes three marketing strategies. An evaluator agent scores them against a rubric (hook strength, specificity, pacing, CTA clarity). The winning script enters Phase 2.

Phase 2 — Voice + Music + Subtitles + Blueprint

The voice step generates the voiceover. On Scale and above, AI Music Supervisor can generate a voiceover-safe instrumental background track for the video. Lower plans and quota fallback paths use Clipus static or curated music. The subtitle step transcribes the voiceover back to timed subtitles. A blueprint compiler emits the render plan (scene list, durations, transitions, audio mix).

For report-backed videos, Studio also shows an Output Capability panel before generation. Use it to confirm the proof posture, allowed polish, audio source, plan status, and risk label.

Phase Edge — Client Render

Your browser renders the final video using WebCodecs (HW-accelerated). No server-side FFmpeg in the critical path. The result lands in your dashboard ready to publish.

What can slow it down

Heavy SaaS pages (DataDog, HubSpot) take longer in Phase 1 because the DOM is larger.
Voice generation queue spikes can add 5-15 seconds to Phase 2.
AI-generated music is plan-limited. If a plan or quota cap blocks generation, Clipus falls back to static background music instead of blocking the video.
Older browsers fall back to the FFmpeg server worker (slower, but still completes).