claude-code-video-toolkit
This file provides guidance to Claude Code (claude.ai/code) when working with this video production toolkit.
Overview
claude-code-video-toolkit is an AI-native video production workspace. It provides Claude Code with the skills, commands, and tools to create professional videos from concept to final render.
Key capabilities:
- Programmatic video creation with Remotion (React-based)
- AI voiceover generation with ElevenLabs or Qwen3-TTS
- AI music generation with ACE-Step 1.5 (text-to-music, vocals, covers, stems)
- Browser demo recording with Playwright
- Asset processing with FFmpeg
Directory Structure
claude-code-video-toolkit/
├── .claude/
│ ├── skills/ # Domain knowledge for Claude
│ └── commands/ # Guided workflows
├── tools/ # Python CLI automation
├── templates/ # Video templates
│ ├── sprint-review/ # Sprint review video template
│ └── product-demo/ # Marketing/product demo template
├── brands/ # Brand profiles (colors, fonts, voice)
├── projects/ # Your video projects go here (gitignored)
├── examples/ # Curated showcase projects (shared)
├── assets/ # Shared assets (voices, images)
├── playwright/ # Browser recording infrastructure
├── docs/ # Documentation
└── _internal/ # Toolkit metadata & registryRegistry
_internal/toolkit-registry.json is the canonical source for all skills, commands, tools, templates, components, transitions, and cloud endpoints — including their paths, status, options, presets, and env vars. Consult it for structured data. This file focuses on workflow guidance, patterns, and knowledge that the registry can't capture.
Quick Start
First-time setup (optional, ~5 minutes):
/setupWalks through cloud GPU, file transfer (R2), and voice configuration. Most features are free. Skip this if you just want to render videos with Node.js.
Work on a video project:
/videoThis command will:
- Scan for existing projects (resume or create new)
- Choose template (sprint-review, product-demo)
- Choose brand (or create one with
/brand) - Plan scenes interactively
- Create project with VOICEOVER-SCRIPT.md
Multi-session support: Projects span multiple sessions. Run /video to resume where you left off. Each project tracks its phase, scenes, assets, and session history in project.json.
Or manually:
cp -r templates/sprint-review projects/my-video
cd projects/my-video
npm install
npm run studio # Preview
npm run render # ExportNote: After creating or modifying commands/skills, restart Claude Code to load changes.
Templates
Templates live in templates/. Each is a standalone Remotion project. See registry templates section for the full list.
sprint-review
Config-driven sprint review videos with theme system, config-driven content (sprint-config.ts), pre-built slides (Title, Overview, Summary, Credits), demo components (single video, split-screen), and audio integration.
product-demo
Marketing/product demo videos with dark tech aesthetic, scene-based composition (title, problem, solution, demo, stats, CTA), animated background, Narrator PiP, browser/terminal chrome, and stats cards with spring animations.
Brand Profiles
Brands live in brands/. Each defines visual identity:
brands/my-brand/
├── brand.json # Colors, fonts, typography
├── voice.json # ElevenLabs voice settings
└── assets/ # Logo, backgroundsSee docs/creating-brands.md for details.
Shared Components
Reusable video components in lib/components/. See registry components section for the full list with descriptions. Import in templates via:
import { AnimatedBackground, SlideTransition, Label } from '../../../../lib/components';Python Tools
Audio, video, and image tools in tools/. See registry tools section for the full catalog with descriptions, options, presets, and env vars. Every tool supports --help.
# Setup
pip install -r tools/requirements.txtImportant: always invoke tools from the toolkit root directory. When working inside a project (projects/my-video/), tool paths like python3 tools/upscale.py will fail because tools/ is relative. Always use:
cd /path/to/claude-code-video-toolkit && python3 tools/upscale.py ...This is especially critical for background commands where the working directory may not be obvious.
Tool Categories
| Type | Tools | When to Use | |
|---|---|---|---|
| Project tools | voiceover, music, musicgen, sfx, synctiming | During video creation workflow | |
| Utility tools | redub, addmusic, notebooklmbrand, locatewatermark | Quick transformations on existing videos | |
| Cloud GPU | imageedit, upscale, dewatermark, sadtalker, qwen3tts, music_gen, flux2 | AI processing via RunPod or Modal (`--cloud runpod\ | modal`) |
Utility tools work on any video file without requiring a project structure.
Voiceover Generation
# Per-scene generation (recommended)
python tools/voiceover.py --scene-dir public/audio/scenes --json
# Using Qwen3-TTS (self-hosted, free alternative to ElevenLabs)
python tools/voiceover.py --provider qwen3 --tone warm --scene-dir public/audio/scenes --json
# Single file (legacy)
python tools/voiceover.py --script SCRIPT.md --output out.mp3Timing Sync (after voiceover)
python3 tools/sync_timing.py # Dry run comparison
python3 tools/sync_timing.py --apply # Update config (1s default padding)
python3 tools/sync_timing.py --apply --padding 1.5 # Custom padding
python3 tools/sync_timing.py --voiceover-json vo.json # Use voiceover.py output
python3 tools/sync_timing.py --json # Machine-readable outputQwen3-TTS (Standalone)
python tools/qwen3_tts.py --text "Hello world" --speaker Ryan --output hello.mp3
python tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3
python tools/qwen3_tts.py --text "Hello" --instruct "Speak enthusiastically" --output excited.mp3
python tools/qwen3_tts.py --text "Hello" --ref-audio sample.wav --ref-text "transcript" --output cloned.mp3
python tools/qwen3_tts.py --list-voices # 9 speakers: Ryan, Aiden, Vivian, etc.
python tools/qwen3_tts.py --list-tones # neutral, warm, professional, excited, etc.Temperature controls expressiveness: --temperature 1.2 (more expressive) or --temperature 0.4 (more consistent).
Cloud GPU Providers
All cloud GPU tools support two providers via --cloud runpod|modal. RunPod is the default. Modal was added as a reliability fallback after RunPod outages, and offers faster cold starts.
# --- RunPod setup (automated, one-time per tool) ---
echo "RUNPOD_API_KEY=your_key_here" >> .env
python tools/image_edit.py --setup
python tools/upscale.py --setup
python tools/qwen3_tts.py --setup
python tools/music_gen.py --setup
# --- Modal setup (deploy each app you need) ---
pip install modal && python3 -m modal setup
modal deploy docker/modal-upscale/app.py # Then save URL to .env
modal deploy docker/modal-image-edit/app.py
# See docs/modal-setup.md for full guideAI Image Editing
# Image editing (Qwen-Image-Edit)
python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses"
python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal
python tools/image_edit.py --input photo.jpg --style cyberpunk
python tools/image_edit.py --input photo.jpg --background office
python tools/image_edit.py --list-presets # Full preset list
# Upscaling (RealESRGAN)
python tools/upscale.py --input photo.jpg --output photo_4x.png --cloud runpod
python tools/upscale.py --input photo.jpg --scale 2 --model anime --face-enhance --cloud runpodSee docs/qwen-edit-patterns.md and .claude/skills/qwen-edit/ for prompting guidance.
AI Music Generation (ACE-Step 1.5)
Default provider is acemusic (official cloud API, free key from acemusic.ai/api-key). Uses XL Turbo 4B model with 5Hz LM thinking mode. Falls back to Modal/RunPod for self-hosted 2B model.
# Background music (acemusic cloud API by default)
python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --bpm 128 --key "G Major" --output music.mp3
# Generate 4 variations, pick the best
python tools/music_gen.py --prompt "Subtle corporate tech" --duration 60 --variations 4 --output bg.mp3
# Fast mode (disable thinking)
python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3
# Scene presets for video production
python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3
python tools/music_gen.py --preset tension --duration 20 --output problem.mp3
python tools/music_gen.py --preset cta --brand digital-samba --output cta.mp3
# Song with vocals and lyrics (use structure tags for sections)
python tools/music_gen.py \
--prompt "Indie pop anthem, male vocal, bright guitar, studio polish" \
--lyrics "[Verse]\nWalking through the morning light\nCoffee in my hand feels right\n\n[Chorus - anthemic]\nWE KEEP MOVING FORWARD\nThrough the noise and doubt\n\n[Outro - fade]\n(Moving forward...)" \
--duration 60 --bpm 128 --key "G Major" --output song.mp3
# Cover / style transfer
python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --output cover.mp3
# Repaint a weak section (acemusic only)
python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
# Continue from existing audio (acemusic only)
python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
# Stem extraction
python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3
# Fall back to self-hosted
python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3
# List presets
python tools/music_gen.py --list-presets8 scene presets: corporate-bg, upbeat-tech, ambient, dramatic, tension, hopeful, cta, lofi. See .claude/skills/acestep/ for prompt engineering patterns and video production integration guide.
Watermark Removal
# Locate watermark coordinates
python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/
python tools/locate_watermark.py --input video.mp4 --preset notebooklm --verify
# Remove watermark (RunPod)
python tools/dewatermark.py --input video.mp4 --region 1080,660,195,40 --output clean.mp4 --runpod
python tools/dewatermark.py --setup # One-time setupWorkflow: grid overlay → note coordinates → verify with --region → remove with dewatermark.
Local mode requires NVIDIA GPU (8GB+ VRAM). Mac users should use --runpod.
Talking Head Generation (SadTalker)
# Basic usage
python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4
# For NarratorPiP integration (recommended settings)
# CRITICAL: --preprocess full preserves image dimensions (otherwise outputs square crop)
python tools/sadtalker.py \
--image presenter_16x9.png \
--audio voiceover.mp3 \
--preprocess full --still --expression-scale 0.8 \
--output narrator.mp4Key flags for NarratorPiP:
--preprocess full— Critical! Preserves input dimensions (defaultcropoutputs square)--still— Reduces head movement for professional look--expression-scale 0.8— Calmer expression (default 1.0)
Image requirements: Face 30-70% of frame, front-facing, 16:9 for NarratorPiP, 512px+ recommended.
See docs/sadtalker.md for detailed options and troubleshooting.
Redub Sync Mode
python tools/redub.py --input video.mp4 --voice-id VOICE_ID --sync --output dubbed.mp4The --sync flag enables word-level time remapping — essential when TTS voice pacing differs from original. Without it, audio can drift 3-4+ seconds by the end.
How it works: Scribe transcribes original → TTS generates new audio with timestamps → segment mapping (15 words/segment) → FFmpeg variable speed per segment.
NotebookLM Branding
Post-processes NotebookLM videos with custom branding. Solves the problem where redubbed TTS audio extends beyond the safe visual trim point.
python tools/notebooklm_brand.py \
--input video_synced.mp4 \
--logo assets/logo.png \
--url "mysite.com" \
--output video_final.mp4Trims NotebookLM visuals, keeps full audio, bridges with freeze frame, adds branded outro.
Video Production Workflow
- Create/resume project - Run
/video, choose template and brand (or resume existing) - Review script - Edit
VOICEOVER-SCRIPT.mdto plan content - Gather assets - Record demos with
/record-demoor add external videos - Scene review - Run
/scene-reviewto verify visuals in Remotion Studio - Design refinement - Use
/designor the "Refine" option in scene-review to improve slide visuals - Generate audio - Use
/generate-voiceoverfor AI narration - Sync timing - Run
python3 tools/sync_timing.py --applyto update config durations - Preview -
npm run studioin project directory - Iterate - Adjust timing, content, styling with Claude Code
- Render -
npm run renderfor final MP4
Project Lifecycle
Projects move through phases tracked in project.json:
planning → assets → review → audio → editing → rendering → complete| Phase | Description |
|---|---|
planning | Defining scenes, writing script |
assets | Recording demos, gathering materials |
review | Scene-by-scene review in Remotion Studio (/scene-review) |
audio | Generating voiceover, music |
editing | Adjusting timing, previewing |
rendering | Final render in progress |
complete | Done |
See lib/project/README.md for details on the project system.
Video Timing
Timing is critical. Keep these guidelines in mind:
Pacing Rules
- Voiceover drives timing — Narration length determines scene duration
- Reading pace — ~150 words/minute (2.5 words/second) for standard narration
- Demo pacing — Real-time demos often need 1.5-2x speedup (
playbackRate) - Transitions — Add 1-2s padding between scenes
- FPS — All videos use 30fps (frames = seconds × 30)
Speaking Rate Tiers
| Pace | WPM | Use When |
|---|---|---|
| Slow | 120-130 | Technical explanations, complex concepts |
| Standard | 140-160 | General narration, demos, overviews |
| Fast | 160-180 | Energetic intros, recaps, CTAs |
Narration Density by Scene Type
| Scene Type | Duration | Narration Density | Notes |
|---|---|---|---|
| Title | 3-5s | 0-10% | Logo + headline, let visuals breathe |
| Overview | 10-20s | 70-90% | 3-5 bullet points, narration-heavy |
| Demo | 10-30s | 30-50% | Let the demo speak, narrate key moments only |
| Stats | 8-12s | 70-90% | Read out highlights, skip obvious numbers |
| Credits | 5-10s | 0-20% | Quick fade, maybe a closing line |
| Problem/Solution | 10-15s | 80-90% | Narration drives the story |
| CTA | 5-10s | 60-80% | Clear call to action, leave a beat at end |
Word Count Budgeting
Before writing scripts, budget words per scene:
Target duration × 2.5 = word budget (at standard pace)
Pause seconds × 2.5 = words to subtract from budget
Example: 15s scene with a 1s pause
15 × 2.5 = 37 words budget
1 × 2.5 = 3 words for pause
Available: ~34 words of narrationUse [pause 1.0s] markers in scripts. Each second of pause costs ~2-3 words from the budget.
Timing Calculations
Script words ÷ 150 = voiceover minutes (estimate)
Raw demo length ÷ playbackRate = demo duration
Sum of scenes + transitions = total videoWhen to Check Timing
- During scene planning — Budget word counts per scene before writing
- After writing script — Count words per scene, compare to budget
- After generating audio — Run
sync_timing.pyto compare actual vs estimated - Before rendering — Ensure
durationInFramesmatches actual audio for each scene
TTS Duration Drift (The Real Timing Problem)
TTS engines do NOT consistently produce 150 WPM output. In practice:
- ElevenLabs tends to compress pauses and speed through short sentences. A 50s script may produce 40-45s of audio.
- Qwen3-TTS varies by speaker and tone preset. Ryan at "professional" tone speaks ~10% faster than "warm."
- Short scenes drift more — a 5-second scene might be off by 30%, while a 30-second scene is off by 10%.
The feedback loop after TTS generation:
- Generate per-scene audio files
- Run
python3 tools/sync_timing.pyto compare actual vs config durations - Run
python3 tools/sync_timing.py --applyto update config automatically - For demo scenes: recalculate
playbackRate = rawDemoDuration / actualNarrationDuration - Re-preview in Remotion Studio before rendering
Common drift patterns and fixes:
| Problem | Symptom | Fix |
|---|---|---|
| Audio shorter than scene | Dead air / awkward silence at end | Reduce durationInFrames to match audio |
| Audio longer than scene | Narration cut off | Increase durationInFrames or trim script |
| Demo too fast for narration | Viewer can't follow | Decrease playbackRate or cut narration |
| Demo too slow for narration | Waiting for demo to catch up | Increase playbackRate (1.5-2x typical) |
| Pauses lost in TTS | Script felt spacious, audio feels rushed | Add explicit <break time="1s"/> in SSML or extend scene padding |
Fixing Mismatches
- Voiceover too long: Speed up demos, trim pauses, cut content
- Voiceover too short: Slow demos, add scenes, expand narration
- Demo too long: Increase
playbackRate(1.5x-2x typical) - Demo too short: Decrease
playbackRate, or loop/extend
Audio-Anchored Timelines (the prevention approach)
sync_timing.py is reactive — it fixes drift after the fact. You can prevent drift entirely by generating the audio first, then anchoring visuals to known timestamps instead of estimating durations upfront.
The pattern:
- Write the script and split into per-scene segments
- Generate per-scene VO files:
voiceover.py --scene-dir public/audio/scenes --json - Read the actual durations from the JSON output
- Anchor every visual element to absolute timestamps in the timeline
This is especially clean for Python/moviepy builds where each clip carries its own start= parameter:
# Audio-anchored scene timeline (25s total):
# Scene 1 tired 0.3 → 3.74 (audio 3.44s)
# Scene 2 worries 4.0 → 8.88 (audio 4.88s)
# Scene 3 introduce 9.1 → 11.90 (audio 2.80s)
text_clip("TIRED OF", start=0.5, duration=1.2)
text_clip("THIRD-PARTY", start=1.0, duration=1.8)
vo_clip("01_tired.mp3", start=0.3)
vo_clip("02_worries.mp3", start=4.0)The comment block at the top is the source of truth. Every start= references it. Drift is impossible because durations aren't being estimated — they're being read from the rendered audio.
Trade-off vs. <Series>-style auto-chaining:
| Approach | Best for | Downside |
|---|---|---|
| Audio-anchored absolute starts | Tight ad-style edits, sub-30s spots, anything with exact timing | Manual bookkeeping when re-timing a scene |
<Series> / auto-chained durations | Long-form sprint reviews where adjacent scenes flex | Drift compounds across the timeline; needs sync_timing.py to recover |
For Remotion projects you can mix the two: use <Sequence from={...}> with absolute frames for tight sections and let <Series> handle the rest. For pure-Python builds (build.py + moviepy), audio-anchored is the natural default.
Key Patterns
Animations (Remotion)
const frame = useCurrentFrame();
const opacity = interpolate(frame, [0, 20], [0, 1], { extrapolateRight: 'clamp' });Sequencing
<Series>
<Series.Sequence durationInFrames={150}><TitleSlide /></Series.Sequence>
<Series.Sequence durationInFrames={900}><DemoClip /></Series.Sequence>
</Series>Media
Always use <OffthreadVideo>, never <video> — Remotion requires its own video component for frame-accurate rendering. Using a raw <video> tag will not render correctly.
<OffthreadVideo src={staticFile('demo.mp4')} />
<Audio src={staticFile('voiceover.mp3')} volume={1} />
<Audio src={staticFile('music.mp3')} volume={0.15} />Scene Transitions
The toolkit includes a transitions library at lib/transitions/. See registry transitions section for the full list with options and best-use descriptions.
Using TransitionSeries
import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { glitch, lightLeak, zoomBlur } from '../../../lib/transitions';
<TransitionSeries>
<TransitionSeries.Sequence durationInFrames={90}>
<TitleSlide />
</TransitionSeries.Sequence>
<TransitionSeries.Transition
presentation={glitch({ intensity: 0.8 })}
timing={linearTiming({ durationInFrames: 20 })}
/>
<TransitionSeries.Sequence durationInFrames={120}>
<ContentSlide />
</TransitionSeries.Sequence>
</TransitionSeries>Transition Options Examples
glitch({ intensity: 0.8, slices: 8, rgbShift: true }) // Tech/cyberpunk
lightLeak({ temperature: 'warm', direction: 'right' }) // Warm celebration
zoomBlur({ direction: 'in', blurAmount: 20 }) // High energy
rgbSplit({ direction: 'diagonal', displacement: 30 }) // Chromatic aberrationTiming Functions
linearTiming({ durationInFrames: 30 }) // Constant speed
springTiming({ config: { damping: 200 }, durationInFrames: 45 }) // Physics bounceTransition Duration Guidelines
| Type | Frames | Notes |
|---|---|---|
| Quick cut | 10-15 | Fast, punchy |
| Standard | 20-30 | Most common |
| Dramatic | 40-60 | Slow reveals |
| Glitch effects | 15-25 | Should feel sudden |
| Light leak | 30-45 | Needs time to sweep |
Preview all transitions: cd showcase/transitions && npm run studio
See lib/transitions/README.md for full documentation.
Design Refinement with frontend-design Skill
The frontend-design skill elevates slide visuals from generic to distinctive.
Usage
- During scene review (
/scene-review): Choose "Refine" for visual improvements - Focused sessions (
/design): Deep-dive on a specific scene —/design title,/design cta
When to Use
- Slide scenes that feel generic
- When building visual contrast between scenes (e.g., calm title → harsh problem)
- When animations feel too basic or too busy
Visual Narrative Arc
Consider how visual intensity builds across scenes:
- Title: Set the mood, plant visual seeds
- Problem: Create tension (harsh contrast)
- Solution: Relief and hope return
- Demo: Neutral, content-focused
- Stats: Build credibility
- CTA: Climax - maximum visual energy
Toolkit vs Project Work
Toolkit work (evolves the toolkit itself):
- Skills, commands, templates, tools
- Tracked in
_internal/ROADMAP.md
Project work (creates videos):
- Lives in
projects/ - Each project has
project.json(machine-readable state) and auto-generatedCLAUDE.md
Keep these separate. Don't mix toolkit improvements with video production.
Documentation
docs/getting-started.md- First video walkthroughdocs/creating-templates.md- Build new templatesdocs/creating-brands.md- Create brand profilesdocs/optional-components.md- Setup for optional ML-based tools (ProPainter, etc.)