Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from bundle
Generate talking-head avatar videos from text. Pipeline: ElevenLabs V3 TTS → OmniHuman 1.5 lipsync → Kling v3 motion enhancement.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: avatar-video-from-text3description: "Generate a talking-head avatar video from text, a character photo, and a voice. Pipeline: ElevenLabs V3 TTS → OmniHuman 1.5 lipsync → Kling v3 pro motion enhancement. Use when you need to create a presenter video from a script without recording."4---56# Avatar Video from Text78Generate a talking-head video from:9- **Text** — what the character says10- **Character photo** — how they look (generated or real)11- **Voice** — any ElevenLabs voice (by ID or name)1213## Pipeline14151. **ElevenLabs V3 TTS** — text → speech audio162. **OmniHuman 1.5** — image + audio → lipsync video (lip movements match the speech)173. **Kling v3 pro motion control** — image + OmniHuman video → enhanced quality video (optional, improves realism)1819## Prerequisites2021- `python3`, `ffmpeg`22- `fal-client` Python package (`uv run --with fal-client` or `pip install fal-client`)23- `ELEVENLABS_API_KEY` in `~/.secrets/elevenlabs.env`24- `FAL_AI_KEY` or `FAL_KEY` in environment2526## Workflow2728### 1. Generate speech2930```bash31python3 scripts/tts_elevenlabs_v3.py \32--text "Your script text here" \33--voice-name "Celestia" \34--output /path/to/speech.mp335```3637Or from a file: `--text-file /path/to/script.txt`3839Or by voice ID: `--voice-id VaKkxizh5XgA7ihroKqO`4041Model defaults to `eleven_v3` (recommended). Voice settings tuned from production:42- `--stability 0.34`43- `--similarity-boost 0.91`44- `--style 0.49`4546For more expressive/less clone-like output: `--stability 0.15 --similarity-boost 0.6 --style 0.85`4748Use `--list-voices` to see all available voices.4950### Audio Tags (v3 only)5152ElevenLabs v3 supports emotion control via tags in the text:53```54[excited] This is amazing!55[sigh] I can't believe it...56[serious] Stop doing that.57[whisper] Don't tell anyone.58```59Available: `[excited]`, `[sad]`, `[angry]`, `[nervous]`, `[sigh]`, `[whisper]`, `[happily]`, `[serious]`, `[tired]`, `[frustrated]`6061### Important: Image size for OmniHuman6263OmniHuman rejects images >5MB. If using Nano Banana 2K images, resize first:64```bash65ffmpeg -y -i big.png -vf "scale=1080:-1" small.png66```6768### 2. Generate lipsync video6970```bash71uv run --with fal-client python3 scripts/omnihuman_lipsync.py \72--image /path/to/character.png \73--audio /path/to/speech.mp3 \74--output /path/to/lipsync.mp475```7677OmniHuman 1.5 costs ~$0.16/sec. A 60s video ≈ $9.60.7879### 3. Enhance with Kling motion control (optional)8081```bash82uv run --with fal-client python3 scripts/kling_motion_enhance.py \83--image /path/to/character.png \84--video /path/to/lipsync.mp4 \85--face-image /path/to/character_face.png \86--prompt "Young woman presenting a product to camera, expressive gestures, photorealistic" \87--output /path/to/final.mp488```8990Kling v3 pro takes the OmniHuman output as motion reference and re-renders with better quality. Uses `elements` for face identity preservation.9192Note: Kling has video length limits. For longer content, split into chunks ≤15s.9394## Cost estimate (60s video)9596| Step | Model | Cost |97|------|-------|------|98| TTS | ElevenLabs V3 | ~$0.25 |99| Lipsync | OmniHuman 1.5 | ~$9.60 |100| Motion enhance | Kling v3 pro (×4-6 chunks) | ~$4-5 |101| **Total** | | **~$14-15** |102103## Guardrails104105- Always review the TTS audio before running OmniHuman (it's the most expensive step).106- For long texts, split into segments and generate separately.107- Kling motion enhance is optional — OmniHuman alone may be sufficient for some use cases.108- Compare OmniHuman output vs Kling-enhanced output before committing to the full pipeline.109110## Files111112- `scripts/tts_elevenlabs_v3.py` — ElevenLabs V3 text-to-speech (any voice)113- `scripts/omnihuman_lipsync.py` — OmniHuman 1.5 image + audio → lipsync video114- `scripts/kling_motion_enhance.py` — Kling v3 pro motion control enhancement115