Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
AI-native Claude Code workspace for planning, recording, voicing, and rendering explainer videos.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
CLAUDE.md
1# claude-code-video-toolkit23This file provides guidance to Claude Code (claude.ai/code) when working with this video production toolkit.45## Overview67**claude-code-video-toolkit** is an AI-native video production workspace. It provides Claude Code with the skills, commands, and tools to create professional videos from concept to final render.89**Key capabilities:**10- Programmatic video creation with Remotion (React-based)11- AI voiceover generation with ElevenLabs or Qwen3-TTS12- AI music generation with ACE-Step 1.5 (text-to-music, vocals, covers, stems)13- Browser demo recording with Playwright14- Asset processing with FFmpeg1516## Directory Structure1718```19claude-code-video-toolkit/20├── .claude/21│ ├── skills/ # Domain knowledge for Claude22│ └── commands/ # Guided workflows23├── tools/ # Python CLI automation24├── templates/ # Video templates25│ ├── sprint-review/ # Sprint review video template26│ └── product-demo/ # Marketing/product demo template27├── brands/ # Brand profiles (colors, fonts, voice)28├── projects/ # Your video projects go here (gitignored)29├── examples/ # Curated showcase projects (shared)30├── assets/ # Shared assets (voices, images)31├── playwright/ # Browser recording infrastructure32├── docs/ # Documentation33└── _internal/ # Toolkit metadata & registry34```3536## Registry3738`_internal/toolkit-registry.json` is the canonical source for all skills, commands, tools, templates, components, transitions, and cloud endpoints — including their paths, status, options, presets, and env vars. Consult it for structured data. This file focuses on **workflow guidance, patterns, and knowledge** that the registry can't capture.3940## Quick Start4142**First-time setup (optional, ~5 minutes):**43```44/setup45```4647Walks through cloud GPU, file transfer (R2), and voice configuration. Most features are free. Skip this if you just want to render videos with Node.js.4849**Work on a video project:**50```51/video52```5354This command will:551. Scan for existing projects (resume or create new)562. Choose template (sprint-review, product-demo)573. Choose brand (or create one with `/brand`)584. Plan scenes interactively595. Create project with VOICEOVER-SCRIPT.md6061**Multi-session support:** Projects span multiple sessions. Run `/video` to resume where you left off. Each project tracks its phase, scenes, assets, and session history in `project.json`.6263**Or manually:**64```bash65cp -r templates/sprint-review projects/my-video66cd projects/my-video67npm install68npm run studio # Preview69npm run render # Export70```7172> **Note:** After creating or modifying commands/skills, restart Claude Code to load changes.7374## Templates7576Templates live in `templates/`. Each is a standalone Remotion project. See registry `templates` section for the full list.7778### sprint-review79Config-driven sprint review videos with theme system, config-driven content (`sprint-config.ts`), pre-built slides (Title, Overview, Summary, Credits), demo components (single video, split-screen), and audio integration.8081### product-demo82Marketing/product demo videos with dark tech aesthetic, scene-based composition (title, problem, solution, demo, stats, CTA), animated background, Narrator PiP, browser/terminal chrome, and stats cards with spring animations.8384## Brand Profiles8586Brands live in `brands/`. Each defines visual identity:8788```89brands/my-brand/90├── brand.json # Colors, fonts, typography91├── voice.json # ElevenLabs voice settings92└── assets/ # Logo, backgrounds93```9495See `docs/creating-brands.md` for details.9697## Shared Components9899Reusable video components in `lib/components/`. See registry `components` section for the full list with descriptions. Import in templates via:100101```tsx102import { AnimatedBackground, SlideTransition, Label } from '../../../../lib/components';103```104105## Python Tools106107Audio, video, and image tools in `tools/`. See registry `tools` section for the full catalog with descriptions, options, presets, and env vars. Every tool supports `--help`.108109```bash110# Setup111pip install -r tools/requirements.txt112```113114**Important: always invoke tools from the toolkit root directory.** When working inside a project (`projects/my-video/`), tool paths like `python3 tools/upscale.py` will fail because `tools/` is relative. Always use:115```bash116cd /path/to/claude-code-video-toolkit && python3 tools/upscale.py ...117```118This is especially critical for background commands where the working directory may not be obvious.119120### Tool Categories121122| Type | Tools | When to Use |123|------|-------|-------------|124| **Project tools** | voiceover, music, music_gen, sfx, sync_timing | During video creation workflow |125| **Utility tools** | redub, addmusic, notebooklm_brand, locate_watermark | Quick transformations on existing videos |126| **Cloud GPU** | image_edit, upscale, dewatermark, sadtalker, qwen3_tts, music_gen, flux2 | AI processing via RunPod or Modal (`--cloud runpod\|modal`) |127128Utility tools work on any video file without requiring a project structure.129130### Voiceover Generation131132```bash133# Per-scene generation (recommended)134python tools/voiceover.py --scene-dir public/audio/scenes --json135136# Using Qwen3-TTS (self-hosted, free alternative to ElevenLabs)137python tools/voiceover.py --provider qwen3 --tone warm --scene-dir public/audio/scenes --json138139# Single file (legacy)140python tools/voiceover.py --script SCRIPT.md --output out.mp3141```142143### Timing Sync (after voiceover)144145```bash146python3 tools/sync_timing.py # Dry run comparison147python3 tools/sync_timing.py --apply # Update config (1s default padding)148python3 tools/sync_timing.py --apply --padding 1.5 # Custom padding149python3 tools/sync_timing.py --voiceover-json vo.json # Use voiceover.py output150python3 tools/sync_timing.py --json # Machine-readable output151```152153### Qwen3-TTS (Standalone)154155```bash156python tools/qwen3_tts.py --text "Hello world" --speaker Ryan --output hello.mp3157python tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3158python tools/qwen3_tts.py --text "Hello" --instruct "Speak enthusiastically" --output excited.mp3159python tools/qwen3_tts.py --text "Hello" --ref-audio sample.wav --ref-text "transcript" --output cloned.mp3160python tools/qwen3_tts.py --list-voices # 9 speakers: Ryan, Aiden, Vivian, etc.161python tools/qwen3_tts.py --list-tones # neutral, warm, professional, excited, etc.162```163164Temperature controls expressiveness: `--temperature 1.2` (more expressive) or `--temperature 0.4` (more consistent).165166### Cloud GPU Providers167168All cloud GPU tools support two providers via `--cloud runpod|modal`. RunPod is the default. Modal was added as a reliability fallback after RunPod outages, and offers faster cold starts.169170```bash171# --- RunPod setup (automated, one-time per tool) ---172echo "RUNPOD_API_KEY=your_key_here" >> .env173python tools/image_edit.py --setup174python tools/upscale.py --setup175python tools/qwen3_tts.py --setup176python tools/music_gen.py --setup177178# --- Modal setup (deploy each app you need) ---179pip install modal && python3 -m modal setup180modal deploy docker/modal-upscale/app.py # Then save URL to .env181modal deploy docker/modal-image-edit/app.py182# See docs/modal-setup.md for full guide183```184185### AI Image Editing186187```bash188189# Image editing (Qwen-Image-Edit)190python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses"191python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal192python tools/image_edit.py --input photo.jpg --style cyberpunk193python tools/image_edit.py --input photo.jpg --background office194python tools/image_edit.py --list-presets # Full preset list195196# Upscaling (RealESRGAN)197python tools/upscale.py --input photo.jpg --output photo_4x.png --cloud runpod198python tools/upscale.py --input photo.jpg --scale 2 --model anime --face-enhance --cloud runpod199```200201See `docs/qwen-edit-patterns.md` and `.claude/skills/qwen-edit/` for prompting guidance.202203### AI Music Generation (ACE-Step 1.5)204205Default provider is **acemusic** (official cloud API, free key from [acemusic.ai/api-key](https://acemusic.ai/api-key)). Uses XL Turbo 4B model with 5Hz LM thinking mode. Falls back to Modal/RunPod for self-hosted 2B model.206207```bash208# Background music (acemusic cloud API by default)209python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --bpm 128 --key "G Major" --output music.mp3210211# Generate 4 variations, pick the best212python tools/music_gen.py --prompt "Subtle corporate tech" --duration 60 --variations 4 --output bg.mp3213214# Fast mode (disable thinking)215python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3216217# Scene presets for video production218python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3219python tools/music_gen.py --preset tension --duration 20 --output problem.mp3220python tools/music_gen.py --preset cta --brand digital-samba --output cta.mp3221222# Song with vocals and lyrics (use structure tags for sections)223python tools/music_gen.py \224--prompt "Indie pop anthem, male vocal, bright guitar, studio polish" \225--lyrics "[Verse]\nWalking through the morning light\nCoffee in my hand feels right\n\n[Chorus - anthemic]\nWE KEEP MOVING FORWARD\nThrough the noise and doubt\n\n[Outro - fade]\n(Moving forward...)" \226--duration 60 --bpm 128 --key "G Major" --output song.mp3227228# Cover / style transfer229python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --output cover.mp3230231# Repaint a weak section (acemusic only)232python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3233234# Continue from existing audio (acemusic only)235python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3236237# Stem extraction238python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3239240# Fall back to self-hosted241python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3242243# List presets244python tools/music_gen.py --list-presets245```2462478 scene presets: `corporate-bg`, `upbeat-tech`, `ambient`, `dramatic`, `tension`, `hopeful`, `cta`, `lofi`. See `.claude/skills/acestep/` for prompt engineering patterns and video production integration guide.248249### Watermark Removal250251```bash252# Locate watermark coordinates253python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/254python tools/locate_watermark.py --input video.mp4 --preset notebooklm --verify255256# Remove watermark (RunPod)257python tools/dewatermark.py --input video.mp4 --region 1080,660,195,40 --output clean.mp4 --runpod258python tools/dewatermark.py --setup # One-time setup259```260261**Workflow:** grid overlay → note coordinates → verify with `--region` → remove with dewatermark.262263**Local mode** requires NVIDIA GPU (8GB+ VRAM). Mac users should use `--runpod`.264265### Talking Head Generation (SadTalker)266267```bash268# Basic usage269python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4270271# For NarratorPiP integration (recommended settings)272# CRITICAL: --preprocess full preserves image dimensions (otherwise outputs square crop)273python tools/sadtalker.py \274--image presenter_16x9.png \275--audio voiceover.mp3 \276--preprocess full --still --expression-scale 0.8 \277--output narrator.mp4278```279280**Key flags for NarratorPiP:**281- `--preprocess full` — **Critical!** Preserves input dimensions (default `crop` outputs square)282- `--still` — Reduces head movement for professional look283- `--expression-scale 0.8` — Calmer expression (default 1.0)284285**Image requirements:** Face 30-70% of frame, front-facing, 16:9 for NarratorPiP, 512px+ recommended.286287See `docs/sadtalker.md` for detailed options and troubleshooting.288289### Redub Sync Mode290291```bash292python tools/redub.py --input video.mp4 --voice-id VOICE_ID --sync --output dubbed.mp4293```294295The `--sync` flag enables word-level time remapping — essential when TTS voice pacing differs from original. Without it, audio can drift 3-4+ seconds by the end.296297**How it works:** Scribe transcribes original → TTS generates new audio with timestamps → segment mapping (15 words/segment) → FFmpeg variable speed per segment.298299### NotebookLM Branding300301Post-processes NotebookLM videos with custom branding. Solves the problem where redubbed TTS audio extends beyond the safe visual trim point.302303```bash304python tools/notebooklm_brand.py \305--input video_synced.mp4 \306--logo assets/logo.png \307--url "mysite.com" \308--output video_final.mp4309```310311Trims NotebookLM visuals, keeps full audio, bridges with freeze frame, adds branded outro.312313## Video Production Workflow3143151. **Create/resume project** - Run `/video`, choose template and brand (or resume existing)3162. **Review script** - Edit `VOICEOVER-SCRIPT.md` to plan content3173. **Gather assets** - Record demos with `/record-demo` or add external videos3184. **Scene review** - Run `/scene-review` to verify visuals in Remotion Studio3195. **Design refinement** - Use `/design` or the "Refine" option in scene-review to improve slide visuals3206. **Generate audio** - Use `/generate-voiceover` for AI narration3217. **Sync timing** - Run `python3 tools/sync_timing.py --apply` to update config durations3228. **Preview** - `npm run studio` in project directory3239. **Iterate** - Adjust timing, content, styling with Claude Code32410. **Render** - `npm run render` for final MP4325326## Project Lifecycle327328Projects move through phases tracked in `project.json`:329330```331planning → assets → review → audio → editing → rendering → complete332```333334| Phase | Description |335|-------|-------------|336| `planning` | Defining scenes, writing script |337| `assets` | Recording demos, gathering materials |338| `review` | Scene-by-scene review in Remotion Studio (`/scene-review`) |339| `audio` | Generating voiceover, music |340| `editing` | Adjusting timing, previewing |341| `rendering` | Final render in progress |342| `complete` | Done |343344See `lib/project/README.md` for details on the project system.345346## Video Timing347348Timing is critical. Keep these guidelines in mind:349350### Pacing Rules351- **Voiceover drives timing** — Narration length determines scene duration352- **Reading pace** — ~150 words/minute (2.5 words/second) for standard narration353- **Demo pacing** — Real-time demos often need 1.5-2x speedup (`playbackRate`)354- **Transitions** — Add 1-2s padding between scenes355- **FPS** — All videos use 30fps (frames = seconds × 30)356357### Speaking Rate Tiers358359| Pace | WPM | Use When |360|------|-----|----------|361| Slow | 120-130 | Technical explanations, complex concepts |362| Standard | 140-160 | General narration, demos, overviews |363| Fast | 160-180 | Energetic intros, recaps, CTAs |364365### Narration Density by Scene Type366367| Scene Type | Duration | Narration Density | Notes |368|------------|----------|-------------------|-------|369| Title | 3-5s | 0-10% | Logo + headline, let visuals breathe |370| Overview | 10-20s | 70-90% | 3-5 bullet points, narration-heavy |371| Demo | 10-30s | 30-50% | Let the demo speak, narrate key moments only |372| Stats | 8-12s | 70-90% | Read out highlights, skip obvious numbers |373| Credits | 5-10s | 0-20% | Quick fade, maybe a closing line |374| Problem/Solution | 10-15s | 80-90% | Narration drives the story |375| CTA | 5-10s | 60-80% | Clear call to action, leave a beat at end |376377### Word Count Budgeting378379Before writing scripts, budget words per scene:380381```382Target duration × 2.5 = word budget (at standard pace)383Pause seconds × 2.5 = words to subtract from budget384385Example: 15s scene with a 1s pause38615 × 2.5 = 37 words budget3871 × 2.5 = 3 words for pause388Available: ~34 words of narration389```390391Use `[pause 1.0s]` markers in scripts. Each second of pause costs ~2-3 words from the budget.392393### Timing Calculations394```395Script words ÷ 150 = voiceover minutes (estimate)396Raw demo length ÷ playbackRate = demo duration397Sum of scenes + transitions = total video398```399400### When to Check Timing401- **During scene planning** — Budget word counts per scene before writing402- **After writing script** — Count words per scene, compare to budget403- **After generating audio** — Run `sync_timing.py` to compare actual vs estimated404- **Before rendering** — Ensure `durationInFrames` matches actual audio for each scene405406### TTS Duration Drift (The Real Timing Problem)407408TTS engines do NOT consistently produce 150 WPM output. In practice:409- **ElevenLabs** tends to compress pauses and speed through short sentences. A 50s script may produce 40-45s of audio.410- **Qwen3-TTS** varies by speaker and tone preset. Ryan at "professional" tone speaks ~10% faster than "warm."411- **Short scenes drift more** — a 5-second scene might be off by 30%, while a 30-second scene is off by 10%.412413**The feedback loop after TTS generation:**4144151. Generate per-scene audio files4162. Run `python3 tools/sync_timing.py` to compare actual vs config durations4173. Run `python3 tools/sync_timing.py --apply` to update config automatically4184. For demo scenes: recalculate `playbackRate = rawDemoDuration / actualNarrationDuration`4195. Re-preview in Remotion Studio before rendering420421**Common drift patterns and fixes:**422423| Problem | Symptom | Fix |424|---------|---------|-----|425| Audio shorter than scene | Dead air / awkward silence at end | Reduce `durationInFrames` to match audio |426| Audio longer than scene | Narration cut off | Increase `durationInFrames` or trim script |427| Demo too fast for narration | Viewer can't follow | Decrease `playbackRate` or cut narration |428| Demo too slow for narration | Waiting for demo to catch up | Increase `playbackRate` (1.5-2x typical) |429| Pauses lost in TTS | Script felt spacious, audio feels rushed | Add explicit `<break time="1s"/>` in SSML or extend scene padding |430431### Fixing Mismatches432- **Voiceover too long**: Speed up demos, trim pauses, cut content433- **Voiceover too short**: Slow demos, add scenes, expand narration434- **Demo too long**: Increase `playbackRate` (1.5x-2x typical)435- **Demo too short**: Decrease `playbackRate`, or loop/extend436437### Audio-Anchored Timelines (the prevention approach)438439`sync_timing.py` is reactive — it fixes drift after the fact. You can prevent drift entirely by **generating the audio first, then anchoring visuals to known timestamps** instead of estimating durations upfront.440441**The pattern:**4424431. Write the script and split into per-scene segments4442. Generate per-scene VO files: `voiceover.py --scene-dir public/audio/scenes --json`4453. Read the actual durations from the JSON output4464. Anchor every visual element to absolute timestamps in the timeline447448This is especially clean for Python/moviepy builds where each clip carries its own `start=` parameter:449450```python451# Audio-anchored scene timeline (25s total):452# Scene 1 tired 0.3 → 3.74 (audio 3.44s)453# Scene 2 worries 4.0 → 8.88 (audio 4.88s)454# Scene 3 introduce 9.1 → 11.90 (audio 2.80s)455456text_clip("TIRED OF", start=0.5, duration=1.2)457text_clip("THIRD-PARTY", start=1.0, duration=1.8)458vo_clip("01_tired.mp3", start=0.3)459vo_clip("02_worries.mp3", start=4.0)460```461462The comment block at the top is the source of truth. Every `start=` references it. Drift is impossible because durations aren't being estimated — they're being read from the rendered audio.463464**Trade-off vs. `<Series>`-style auto-chaining:**465466| Approach | Best for | Downside |467|----------|----------|----------|468| Audio-anchored absolute starts | Tight ad-style edits, sub-30s spots, anything with exact timing | Manual bookkeeping when re-timing a scene |469| `<Series>` / auto-chained durations | Long-form sprint reviews where adjacent scenes flex | Drift compounds across the timeline; needs `sync_timing.py` to recover |470471For Remotion projects you can mix the two: use `<Sequence from={...}>` with absolute frames for tight sections and let `<Series>` handle the rest. For pure-Python builds (`build.py` + moviepy), audio-anchored is the natural default.472473## Key Patterns474475### Animations (Remotion)476```tsx477const frame = useCurrentFrame();478const opacity = interpolate(frame, [0, 20], [0, 1], { extrapolateRight: 'clamp' });479```480481### Sequencing482```tsx483<Series>484<Series.Sequence durationInFrames={150}><TitleSlide /></Series.Sequence>485<Series.Sequence durationInFrames={900}><DemoClip /></Series.Sequence>486</Series>487```488489### Media490491**Always use `<OffthreadVideo>`, never `<video>`** — Remotion requires its own video component for frame-accurate rendering. Using a raw `<video>` tag will not render correctly.492493```tsx494<OffthreadVideo src={staticFile('demo.mp4')} />495<Audio src={staticFile('voiceover.mp3')} volume={1} />496<Audio src={staticFile('music.mp3')} volume={0.15} />497```498499## Scene Transitions500501The toolkit includes a transitions library at `lib/transitions/`. See registry `transitions` section for the full list with options and best-use descriptions.502503### Using TransitionSeries504505```tsx506import { TransitionSeries, linearTiming } from '@remotion/transitions';507import { glitch, lightLeak, zoomBlur } from '../../../lib/transitions';508509<TransitionSeries>510<TransitionSeries.Sequence durationInFrames={90}>511<TitleSlide />512</TransitionSeries.Sequence>513<TransitionSeries.Transition514presentation={glitch({ intensity: 0.8 })}515timing={linearTiming({ durationInFrames: 20 })}516/>517<TransitionSeries.Sequence durationInFrames={120}>518<ContentSlide />519</TransitionSeries.Sequence>520</TransitionSeries>521```522523### Transition Options Examples524525```tsx526glitch({ intensity: 0.8, slices: 8, rgbShift: true }) // Tech/cyberpunk527lightLeak({ temperature: 'warm', direction: 'right' }) // Warm celebration528zoomBlur({ direction: 'in', blurAmount: 20 }) // High energy529rgbSplit({ direction: 'diagonal', displacement: 30 }) // Chromatic aberration530```531532### Timing Functions533534```tsx535linearTiming({ durationInFrames: 30 }) // Constant speed536springTiming({ config: { damping: 200 }, durationInFrames: 45 }) // Physics bounce537```538539### Transition Duration Guidelines540541| Type | Frames | Notes |542|------|--------|-------|543| Quick cut | 10-15 | Fast, punchy |544| Standard | 20-30 | Most common |545| Dramatic | 40-60 | Slow reveals |546| Glitch effects | 15-25 | Should feel sudden |547| Light leak | 30-45 | Needs time to sweep |548549Preview all transitions: `cd showcase/transitions && npm run studio`550551See `lib/transitions/README.md` for full documentation.552553## Design Refinement with frontend-design Skill554555The `frontend-design` skill elevates slide visuals from generic to distinctive.556557### Usage558- **During scene review** (`/scene-review`): Choose "Refine" for visual improvements559- **Focused sessions** (`/design`): Deep-dive on a specific scene — `/design title`, `/design cta`560561### When to Use562- Slide scenes that feel generic563- When building visual contrast between scenes (e.g., calm title → harsh problem)564- When animations feel too basic or too busy565566### Visual Narrative Arc567Consider how visual intensity builds across scenes:568- **Title**: Set the mood, plant visual seeds569- **Problem**: Create tension (harsh contrast)570- **Solution**: Relief and hope return571- **Demo**: Neutral, content-focused572- **Stats**: Build credibility573- **CTA**: Climax - maximum visual energy574575## Toolkit vs Project Work576577**Toolkit work** (evolves the toolkit itself):578- Skills, commands, templates, tools579- Tracked in `_internal/ROADMAP.md`580581**Project work** (creates videos):582- Lives in `projects/`583- Each project has `project.json` (machine-readable state) and auto-generated `CLAUDE.md`584585Keep these separate. Don't mix toolkit improvements with video production.586587## Documentation588589- `docs/getting-started.md` - First video walkthrough590- `docs/creating-templates.md` - Build new templates591- `docs/creating-brands.md` - Create brand profiles592- `docs/optional-components.md` - Setup for optional ML-based tools (ProPainter, etc.)593