Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from bundle
Capture a talking-head clip with camera and microphone, transcribe it with ElevenLabs word-level timestamps, detect immediate doubled words or stutters, render
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: record-transcribe-revoice3description: Capture a talking-head clip with camera and microphone, transcribe it with ElevenLabs word-level timestamps, detect immediate doubled words or stutters, render a de-stutter preview, run ElevenLabs speech-to-speech, and mux the new voice back onto the video. Use when recording short social videos, creator explainers, or motion-transfer source clips that need quick cleanup before voice polish.4---56# Record, Transcribe, Revoice78Use this skill for a practical creator pipeline:9101. record a short take with camera + mic112. transcribe it with word timestamps123. cut obvious repeated words or stutters134. generate a voice-to-voice pass145. lay the new voice back onto the cleaned video1516## Prerequisites1718You need:1920- `ffmpeg` and `ffprobe`21- `python3`22- an `ELEVENLABS_API_KEY`23- a browser for the bundled recorder UI2425If `ffmpeg` is missing, install it first:2627- macOS: `brew install ffmpeg`28- Ubuntu/Debian: `sudo apt update && sudo apt install -y ffmpeg`29- Arch: `sudo pacman -S ffmpeg`30- Windows: `winget install Gyan.FFmpeg`3132## Capture3334Use the bundled recorder page at `assets/camera-recorder.html.txt`.3536Because browser camera access needs a local origin, serve the folder first:3738```bash39cd assets40python3 -m http.server 876541```4243Then open:4445```text46http://127.0.0.1:8765/camera-recorder.html47```4849Recorder expectations:5051- preview the camera feed52- choose camera and microphone53- record with audio enabled54- save each take locally5556## Workflow5758### 1. Transcribe with word timings5960Run:6162```bash63python3 scripts/transcribe_with_elevenlabs.py \64--input /path/to/take.webm \65--out-dir /path/to/output66```6768This produces:6970- `*.elevenlabs.transcript.json`71- `*.clean.txt`72- `*.sentences.json`73- `*.pauses.json`7475### 2. Render a de-stutter preview7677Run:7879```bash80python3 scripts/build_destutter_preview.py \81--media /path/to/take.webm \82--transcript /path/to/take.elevenlabs.transcript.json \83--output /path/to/take.destutter-preview.mp484```8586This only targets immediate doubled words or obvious stutters. It is a preview pass, not a full editorial cut.8788### 3. Run speech-to-speech8990First extract aligned audio from the de-stutter preview:9192```bash93ffmpeg -i /path/to/take.destutter-preview.mp4 -vn -ac 1 /path/to/take.destutter-preview.wav94```9596Run:9798```bash99python3 scripts/speech_to_speech_elevenlabs.py \100--input-audio /path/to/take.destutter-preview.wav \101--voice-name "Celestia 6" \102--output /path/to/take.v2v.mp3103```104105Use `--voice-id` if you already know the exact ElevenLabs voice.106107### 4. Lay the new voice back onto the video108109Run:110111```bash112python3 scripts/mux_audio_to_video.py \113--video /path/to/take.destutter-preview.mp4 \114--audio /path/to/take.v2v.mp3 \115--output /path/to/take.final.mp4116```117118## Guardrails119120- Do not trust speech-to-speech timing blindly. Compare generated audio duration against the source before muxing.121- Treat the de-stutter pass as a conservative cleanup only. Larger retakes still need manual editorial judgment.122- Keep the original take, the transcript JSON, the de-stutter preview, and the speech-to-speech result as separate artifacts.123- If the generated voice drifts badly in pacing, switch voice or settings rather than forcing a bad output into the video.124125## Files126127- `assets/camera-recorder.html.txt`: local browser recorder with camera + mic controls128- `scripts/transcribe_with_elevenlabs.py`: extract/transcribe and emit word-level transcript artifacts129- `scripts/build_destutter_preview.py`: find immediate repeated words and render a cleaned preview130- `scripts/speech_to_speech_elevenlabs.py`: call ElevenLabs speech-to-speech131- `scripts/mux_audio_to_video.py`: combine cleaned video with generated audio132