Source from bundle

Record Transcribe Revoice

Capture a talking-head clip with camera and microphone, transcribe it with ElevenLabs word-level timestamps, detect immediate doubled words or stutters, render

Костянтин@Latand

Files

Skill

0.9K

Size

35.6 KB

Entrypoint

SKILL.md

Format

folder

Open file

SKILL.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown132 linesEntrypointFree

SKILL.md

1---
2name: record-transcribe-revoice
3description: Capture a talking-head clip with camera and microphone, transcribe it with ElevenLabs word-level timestamps, detect immediate doubled words or stutters, render a de-stutter preview, run ElevenLabs speech-to-speech, and mux the new voice back onto the video. Use when recording short social videos, creator explainers, or motion-transfer source clips that need quick cleanup before voice polish.
4---
5 
6# Record, Transcribe, Revoice
7 
8Use this skill for a practical creator pipeline:
9 
101. record a short take with camera + mic
112. transcribe it with word timestamps
123. cut obvious repeated words or stutters
134. generate a voice-to-voice pass
145. lay the new voice back onto the cleaned video
15 
16## Prerequisites
17 
18You need:
19 
20- `ffmpeg` and `ffprobe`
21- `python3`
22- an `ELEVENLABS_API_KEY`
23- a browser for the bundled recorder UI
24 
25If `ffmpeg` is missing, install it first:
26 
27- macOS: `brew install ffmpeg`
28- Ubuntu/Debian: `sudo apt update && sudo apt install -y ffmpeg`
29- Arch: `sudo pacman -S ffmpeg`
30- Windows: `winget install Gyan.FFmpeg`
31 
32## Capture
33 
34Use the bundled recorder page at `assets/camera-recorder.html.txt`.
35 
36Because browser camera access needs a local origin, serve the folder first:
37 
38```bash
39cd assets
40python3 -m http.server 8765
41```
42 
43Then open:
44 
45```text
46http://127.0.0.1:8765/camera-recorder.html
47```
48 
49Recorder expectations:
50 
51- preview the camera feed
52- choose camera and microphone
53- record with audio enabled
54- save each take locally
55 
56## Workflow
57 
58### 1. Transcribe with word timings
59 
60Run:
61 
62```bash
63python3 scripts/transcribe_with_elevenlabs.py \
64  --input /path/to/take.webm \
65  --out-dir /path/to/output
66```
67 
68This produces:
69 
70- `*.elevenlabs.transcript.json`
71- `*.clean.txt`
72- `*.sentences.json`
73- `*.pauses.json`
74 
75### 2. Render a de-stutter preview
76 
77Run:
78 
79```bash
80python3 scripts/build_destutter_preview.py \
81  --media /path/to/take.webm \
82  --transcript /path/to/take.elevenlabs.transcript.json \
83  --output /path/to/take.destutter-preview.mp4
84```
85 
86This only targets immediate doubled words or obvious stutters. It is a preview pass, not a full editorial cut.
87 
88### 3. Run speech-to-speech
89 
90First extract aligned audio from the de-stutter preview:
91 
92```bash
93ffmpeg -i /path/to/take.destutter-preview.mp4 -vn -ac 1 /path/to/take.destutter-preview.wav
94```
95 
96Run:
97 
98```bash
99python3 scripts/speech_to_speech_elevenlabs.py \
100  --input-audio /path/to/take.destutter-preview.wav \
101  --voice-name "Celestia 6" \
102  --output /path/to/take.v2v.mp3
103```
104 
105Use `--voice-id` if you already know the exact ElevenLabs voice.
106 
107### 4. Lay the new voice back onto the video
108 
109Run:
110 
111```bash
112python3 scripts/mux_audio_to_video.py \
113  --video /path/to/take.destutter-preview.mp4 \
114  --audio /path/to/take.v2v.mp3 \
115  --output /path/to/take.final.mp4
116```
117 
118## Guardrails
119 
120- Do not trust speech-to-speech timing blindly. Compare generated audio duration against the source before muxing.
121- Treat the de-stutter pass as a conservative cleanup only. Larger retakes still need manual editorial judgment.
122- Keep the original take, the transcript JSON, the de-stutter preview, and the speech-to-speech result as separate artifacts.
123- If the generated voice drifts badly in pacing, switch voice or settings rather than forcing a bad output into the video.
124 
125## Files
126 
127- `assets/camera-recorder.html.txt`: local browser recorder with camera + mic controls
128- `scripts/transcribe_with_elevenlabs.py`: extract/transcribe and emit word-level transcript artifacts
129- `scripts/build_destutter_preview.py`: find immediate repeated words and render a cleaned preview
130- `scripts/speech_to_speech_elevenlabs.py`: call ElevenLabs speech-to-speech
131- `scripts/mux_audio_to_video.py`: combine cleaned video with generated audio
132

Marketplace

Source from bundle

Record Transcribe Revoice

Capture a talking-head clip with camera and microphone, transcribe it with ElevenLabs word-level timestamps, detect immediate doubled words or stutters, render

Костянтин@Latand

Files

Skill

0.9K

Size

35.6 KB

Entrypoint

SKILL.md

Format

folder

Open file

SKILL.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown132 linesEntrypointFree

SKILL.md

1---
2name: record-transcribe-revoice
3description: Capture a talking-head clip with camera and microphone, transcribe it with ElevenLabs word-level timestamps, detect immediate doubled words or stutters, render a de-stutter preview, run ElevenLabs speech-to-speech, and mux the new voice back onto the video. Use when recording short social videos, creator explainers, or motion-transfer source clips that need quick cleanup before voice polish.
4---
5 
6# Record, Transcribe, Revoice
7 
8Use this skill for a practical creator pipeline:
9 
101. record a short take with camera + mic
112. transcribe it with word timestamps
123. cut obvious repeated words or stutters
134. generate a voice-to-voice pass
145. lay the new voice back onto the cleaned video
15 
16## Prerequisites
17 
18You need:
19 
20- `ffmpeg` and `ffprobe`
21- `python3`
22- an `ELEVENLABS_API_KEY`
23- a browser for the bundled recorder UI
24 
25If `ffmpeg` is missing, install it first:
26 
27- macOS: `brew install ffmpeg`
28- Ubuntu/Debian: `sudo apt update && sudo apt install -y ffmpeg`
29- Arch: `sudo pacman -S ffmpeg`
30- Windows: `winget install Gyan.FFmpeg`
31 
32## Capture
33 
34Use the bundled recorder page at `assets/camera-recorder.html.txt`.
35 
36Because browser camera access needs a local origin, serve the folder first:
37 
38```bash
39cd assets
40python3 -m http.server 8765
41```
42 
43Then open:
44 
45```text
46http://127.0.0.1:8765/camera-recorder.html
47```
48 
49Recorder expectations:
50 
51- preview the camera feed
52- choose camera and microphone
53- record with audio enabled
54- save each take locally
55 
56## Workflow
57 
58### 1. Transcribe with word timings
59 
60Run:
61 
62```bash
63python3 scripts/transcribe_with_elevenlabs.py \
64  --input /path/to/take.webm \
65  --out-dir /path/to/output
66```
67 
68This produces:
69 
70- `*.elevenlabs.transcript.json`
71- `*.clean.txt`
72- `*.sentences.json`
73- `*.pauses.json`
74 
75### 2. Render a de-stutter preview
76 
77Run:
78 
79```bash
80python3 scripts/build_destutter_preview.py \
81  --media /path/to/take.webm \
82  --transcript /path/to/take.elevenlabs.transcript.json \
83  --output /path/to/take.destutter-preview.mp4
84```
85 
86This only targets immediate doubled words or obvious stutters. It is a preview pass, not a full editorial cut.
87 
88### 3. Run speech-to-speech
89 
90First extract aligned audio from the de-stutter preview:
91 
92```bash
93ffmpeg -i /path/to/take.destutter-preview.mp4 -vn -ac 1 /path/to/take.destutter-preview.wav
94```
95 
96Run:
97 
98```bash
99python3 scripts/speech_to_speech_elevenlabs.py \
100  --input-audio /path/to/take.destutter-preview.wav \
101  --voice-name "Celestia 6" \
102  --output /path/to/take.v2v.mp3
103```
104 
105Use `--voice-id` if you already know the exact ElevenLabs voice.
106 
107### 4. Lay the new voice back onto the video
108 
109Run:
110 
111```bash
112python3 scripts/mux_audio_to_video.py \
113  --video /path/to/take.destutter-preview.mp4 \
114  --audio /path/to/take.v2v.mp3 \
115  --output /path/to/take.final.mp4
116```
117 
118## Guardrails
119 
120- Do not trust speech-to-speech timing blindly. Compare generated audio duration against the source before muxing.
121- Treat the de-stutter pass as a conservative cleanup only. Larger retakes still need manual editorial judgment.
122- Keep the original take, the transcript JSON, the de-stutter preview, and the speech-to-speech result as separate artifacts.
123- If the generated voice drifts badly in pacing, switch voice or settings rather than forcing a bad output into the video.
124 
125## Files
126 
127- `assets/camera-recorder.html.txt`: local browser recorder with camera + mic controls
128- `scripts/transcribe_with_elevenlabs.py`: extract/transcribe and emit word-level transcript artifacts
129- `scripts/build_destutter_preview.py`: find immediate repeated words and render a cleaned preview
130- `scripts/speech_to_speech_elevenlabs.py`: call ElevenLabs speech-to-speech
131- `scripts/mux_audio_to_video.py`: combine cleaned video with generated audio
132

Record Transcribe Revoice

SKILL.md

Preparing the source view

Record Transcribe Revoice

SKILL.md