fal.ai Video & Image Toolkit
Unified CLI for video generation, motion transfer, speaking avatars, and image editing via fal.ai.
Quick start
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./photo.png \
--model kling-o3 \
--prompt "a person walking forward confidently" \
--duration 10 \
--out ./output.mp4Available models
| Shortcut | Endpoint | Type | Price |
|---|---|---|---|
kling-o3 | fal-ai/kling-video/o3/standard/image-to-video | image-to-video | $0.084/s |
kling-v3-pro | fal-ai/kling-video/v3/pro/image-to-video | image-to-video | $0.112/s |
kling-turbo | fal-ai/kling-video/v2.5-turbo/pro/image-to-video | image-to-video | $0.07/s |
seedance | fal-ai/bytedance/seedance/v1.5/pro/image-to-video | image-to-video | ~$0.26/5s |
kling-motion | fal-ai/kling-video/v3/pro/motion-control | motion transfer | $0.126/s |
kling-motion-std | fal-ai/kling-video/v3/standard/motion-control | motion transfer | cheaper |
heygen | fal-ai/heygen/avatar4/image-to-video | speaking avatar | $0.10/s |
seedream-v5 | fal-ai/bytedance/seedream/v5/lite/edit | image edit/upscale | $0.035/img |
seedream-v4 | fal-ai/bytedance/seedream/v4.5/edit | image edit/transform | $0.04/img |
grok-text | xai/grok-imagine-video/text-to-video | text-to-video | see fal |
grok-image | xai/grok-imagine-video/image-to-video | image-to-video | see fal |
grok-edit | xai/grok-imagine-video/edit-video | video-to-video edit | see fal |
Or pass any full fal endpoint as --model fal-ai/....
Credentials
Resolved in order: --api-key > --api-key-file > FAL_KEY > FAL_API_KEY > FAL_AI_KEY > FAL_KEY_ID+FAL_KEY_SECRET
Usage patterns
Image to video (Kling O3, default)
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./photo.png \
--model kling-o3 \
--prompt "description of desired motion" \
--duration 10 \
--no-audio \
--out ./video.mp4Image to video (Seedance 1.5 Pro, with audio)
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./photo.png \
--model seedance \
--prompt "description of motion" \
--duration 10 \
--resolution 1080p \
--out ./video.mp4Motion transfer
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./person.png \
--video ./dance_reference.mp4 \
--model kling-motion \
--orientation video \
--prompt "photorealistic person following the motion naturally" \
--out ./motion_output.mp4Motion transfer from a source video's frame
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--extract-first-frame-from-video ./identity-source.mp4 \
--extract-frame-time 00:00:00.500 \
--video ./dance_reference.mp4 \
--model kling-motion-std \
--orientation image \
--prompt "photorealistic subject, keep identity from the extracted frame, follow the source performance naturally" \
--out ./motion_from_video.mp4Motion transfer + Telegram delivery
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--extract-first-frame-from-video ./identity-source.mp4 \
--extract-frame-time 00:00:00.500 \
--video ./dance_reference.mp4 \
--model kling-motion-std \
--orientation image \
--prompt "photorealistic subject, keep identity from the extracted frame, follow the source performance naturally" \
--out ./motion_from_video.mp4 \
--telegram-target @mychat \
--telegram-thread-id 42 \
--telegram-message "render ready"Grok Imagine text-to-video
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--model grok-text \
--prompt "cinematic handheld shot of a storm rolling over Kyiv rooftops" \
--duration 6 \
--resolution 720p \
--aspect-ratio 16:9 \
--out ./grok-text.mp4Grok Imagine image-to-video
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./start_frame.png \
--model grok-image \
--prompt "subtle camera push-in, hair moving in the wind, natural motion" \
--duration 6 \
--resolution 720p \
--aspect-ratio 9:16 \
--out ./grok-image.mp4Grok Imagine from a reference video's first frame
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--extract-first-frame-from-video ./reference.mp4 \
--model grok-image \
--prompt "continue the scene with smooth realistic motion" \
--duration 6 \
--out ./grok-from-frame.mp4Grok Imagine video edit
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--video ./reference.mp4 \
--model grok-edit \
--prompt "convert this clip into a moody neon cyberpunk scene" \
--resolution 720p \
--out ./grok-edit.mp4Speaking avatar (HeyGen)
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./face.png \
--model heygen \
--prompt "Hello! Welcome to the show" \
--talking-style expressive \
--resolution 720p \
--aspect-ratio 9:16 \
--out ./speaking.mp4Image upscale (Seedream v5 to 2K)
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./photo.png \
--model seedream-v5 \
--prompt "enhance quality, sharpen details, keep original appearance" \
--out ./upscaled.pngImage transform (Seedream v4.5, replace character)
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./scene.png \
--model seedream-v4 \
--prompt "Replace the character with a photorealistic person, keep same pose and background" \
--image-size 2048x2048 \
--out ./transformed.pngStart + end frame video
uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
--image ./start_frame.png \
--end-image ./end_frame.png \
--model kling-o3 \
--prompt "smooth transition between poses" \
--duration 5 \
--out ./transition.mp4Key flags
| Flag | Description |
|---|---|
--model | Model shortcut or full fal endpoint |
--duration | Video length in seconds |
--orientation | image or video (motion control) |
--resolution | 480p/720p/1080p (Seedance, HeyGen) |
--aspect-ratio | 16:9, 9:16, 1:1 etc |
--no-audio | Disable audio generation |
--no-safety | Disable client safety checker |
--negative-prompt | What to exclude (Kling) |
--talking-style | stable or expressive (HeyGen) |
--image-size | Output size for Seedream (e.g. 2048x2048) |
--end-image | End frame for transition videos |
--extract-first-frame-from-video | Extract frame from a reference video and use it as --image |
--extract-frame-time | Timestamp for the extracted frame (default 00:00:00) |
--telegram-target | Send the finished media to Telegram after download |
--telegram-thread-id | Optional Telegram topic/thread id for delivery |
--telegram-reply-to | Optional Telegram message id to reply to |
--telegram-message | Optional Telegram caption/message |
--json-out | Save raw API response for debugging |
Guardrails
- Start with the best possible image. Models transfer motion better than they fix bad composites.
- For motion control, match the source image pose to the first frame of the reference video.
- Keep examples generic; pass file paths, Telegram targets, and thread ids explicitly instead of baking personal values into the skill.
- Images >5MB must be resized before Kling motion control:
ffmpeg -y -i big.png -vf "scale=1080:1920:force_original_aspect_ratio=decrease" small.png - Multiple motion transfers or image-to-video calls are independent — run them in parallel for multi-scene workflows.
- Keep prompts short, positive, and direct.
- Seedance duration must be a string (
"10"not10), the script handles this automatically. - Grok Imagine video endpoints on fal.ai do not currently expose a documented
enable_safety_checkeror NSFW toggle. The script warns instead of silently pretending--no-safetyworks there. - For full model specs and parameters, see
references/fal-models.md.