fal.ai Video & Image Toolkit

Unified CLI for video generation, motion transfer, speaking avatars, and image editing via fal.ai.

Quick start

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model kling-o3 \
  --prompt "a person walking forward confidently" \
  --duration 10 \
  --out ./output.mp4

Available models

Shortcut	Endpoint	Type	Price
`kling-o3`	`fal-ai/kling-video/o3/standard/image-to-video`	image-to-video	$0.084/s
`kling-v3-pro`	`fal-ai/kling-video/v3/pro/image-to-video`	image-to-video	$0.112/s
`kling-turbo`	`fal-ai/kling-video/v2.5-turbo/pro/image-to-video`	image-to-video	$0.07/s
`seedance`	`fal-ai/bytedance/seedance/v1.5/pro/image-to-video`	image-to-video	~$0.26/5s
`kling-motion`	`fal-ai/kling-video/v3/pro/motion-control`	motion transfer	$0.126/s
`kling-motion-std`	`fal-ai/kling-video/v3/standard/motion-control`	motion transfer	cheaper
`heygen`	`fal-ai/heygen/avatar4/image-to-video`	speaking avatar	$0.10/s
`seedream-v5`	`fal-ai/bytedance/seedream/v5/lite/edit`	image edit/upscale	$0.035/img
`seedream-v4`	`fal-ai/bytedance/seedream/v4.5/edit`	image edit/transform	$0.04/img
`grok-text`	`xai/grok-imagine-video/text-to-video`	text-to-video	see fal
`grok-image`	`xai/grok-imagine-video/image-to-video`	image-to-video	see fal
`grok-edit`	`xai/grok-imagine-video/edit-video`	video-to-video edit	see fal

Or pass any full fal endpoint as --model fal-ai/....

Credentials

Resolved in order: --api-key > --api-key-file > FAL_KEY > FAL_API_KEY > FAL_AI_KEY > FAL_KEY_ID+FAL_KEY_SECRET

Usage patterns

Image to video (Kling O3, default)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model kling-o3 \
  --prompt "description of desired motion" \
  --duration 10 \
  --no-audio \
  --out ./video.mp4

Image to video (Seedance 1.5 Pro, with audio)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model seedance \
  --prompt "description of motion" \
  --duration 10 \
  --resolution 1080p \
  --out ./video.mp4

Motion transfer

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./person.png \
  --video ./dance_reference.mp4 \
  --model kling-motion \
  --orientation video \
  --prompt "photorealistic person following the motion naturally" \
  --out ./motion_output.mp4

Motion transfer from a source video's frame

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --extract-first-frame-from-video ./identity-source.mp4 \
  --extract-frame-time 00:00:00.500 \
  --video ./dance_reference.mp4 \
  --model kling-motion-std \
  --orientation image \
  --prompt "photorealistic subject, keep identity from the extracted frame, follow the source performance naturally" \
  --out ./motion_from_video.mp4

Motion transfer + Telegram delivery

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --extract-first-frame-from-video ./identity-source.mp4 \
  --extract-frame-time 00:00:00.500 \
  --video ./dance_reference.mp4 \
  --model kling-motion-std \
  --orientation image \
  --prompt "photorealistic subject, keep identity from the extracted frame, follow the source performance naturally" \
  --out ./motion_from_video.mp4 \
  --telegram-target @mychat \
  --telegram-thread-id 42 \
  --telegram-message "render ready"

Grok Imagine text-to-video

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --model grok-text \
  --prompt "cinematic handheld shot of a storm rolling over Kyiv rooftops" \
  --duration 6 \
  --resolution 720p \
  --aspect-ratio 16:9 \
  --out ./grok-text.mp4

Grok Imagine image-to-video

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./start_frame.png \
  --model grok-image \
  --prompt "subtle camera push-in, hair moving in the wind, natural motion" \
  --duration 6 \
  --resolution 720p \
  --aspect-ratio 9:16 \
  --out ./grok-image.mp4

Grok Imagine from a reference video's first frame

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --extract-first-frame-from-video ./reference.mp4 \
  --model grok-image \
  --prompt "continue the scene with smooth realistic motion" \
  --duration 6 \
  --out ./grok-from-frame.mp4

Grok Imagine video edit

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --video ./reference.mp4 \
  --model grok-edit \
  --prompt "convert this clip into a moody neon cyberpunk scene" \
  --resolution 720p \
  --out ./grok-edit.mp4

Speaking avatar (HeyGen)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./face.png \
  --model heygen \
  --prompt "Hello! Welcome to the show" \
  --talking-style expressive \
  --resolution 720p \
  --aspect-ratio 9:16 \
  --out ./speaking.mp4

Image upscale (Seedream v5 to 2K)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model seedream-v5 \
  --prompt "enhance quality, sharpen details, keep original appearance" \
  --out ./upscaled.png

Image transform (Seedream v4.5, replace character)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./scene.png \
  --model seedream-v4 \
  --prompt "Replace the character with a photorealistic person, keep same pose and background" \
  --image-size 2048x2048 \
  --out ./transformed.png

Start + end frame video

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./start_frame.png \
  --end-image ./end_frame.png \
  --model kling-o3 \
  --prompt "smooth transition between poses" \
  --duration 5 \
  --out ./transition.mp4

Key flags

Flag	Description
`--model`	Model shortcut or full fal endpoint
`--duration`	Video length in seconds
`--orientation`	`image` or `video` (motion control)
`--resolution`	`480p`/`720p`/`1080p` (Seedance, HeyGen)
`--aspect-ratio`	`16:9`, `9:16`, `1:1` etc
`--no-audio`	Disable audio generation
`--no-safety`	Disable client safety checker
`--negative-prompt`	What to exclude (Kling)
`--talking-style`	`stable` or `expressive` (HeyGen)
`--image-size`	Output size for Seedream (e.g. `2048x2048`)
`--end-image`	End frame for transition videos
`--extract-first-frame-from-video`	Extract frame from a reference video and use it as `--image`
`--extract-frame-time`	Timestamp for the extracted frame (default `00:00:00`)
`--telegram-target`	Send the finished media to Telegram after download
`--telegram-thread-id`	Optional Telegram topic/thread id for delivery
`--telegram-reply-to`	Optional Telegram message id to reply to
`--telegram-message`	Optional Telegram caption/message
`--json-out`	Save raw API response for debugging

Guardrails

Start with the best possible image. Models transfer motion better than they fix bad composites.
For motion control, match the source image pose to the first frame of the reference video.
Keep examples generic; pass file paths, Telegram targets, and thread ids explicitly instead of baking personal values into the skill.
Images >5MB must be resized before Kling motion control: ffmpeg -y -i big.png -vf "scale=1080:1920:force_original_aspect_ratio=decrease" small.png
Multiple motion transfers or image-to-video calls are independent — run them in parallel for multi-scene workflows.
Keep prompts short, positive, and direct.
Seedance duration must be a string ("10" not 10), the script handles this automatically.
Grok Imagine video endpoints on fal.ai do not currently expose a documented enable_safety_checker or NSFW toggle. The script warns instead of silently pretending --no-safety works there.
For full model specs and parameters, see references/fal-models.md.

fal.ai Video & Image Toolkit

Unified CLI for video generation, motion transfer, speaking avatars, and image editing via fal.ai.

Quick start

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model kling-o3 \
  --prompt "a person walking forward confidently" \
  --duration 10 \
  --out ./output.mp4

Available models

Shortcut	Endpoint	Type	Price
`kling-o3`	`fal-ai/kling-video/o3/standard/image-to-video`	image-to-video	$0.084/s
`kling-v3-pro`	`fal-ai/kling-video/v3/pro/image-to-video`	image-to-video	$0.112/s
`kling-turbo`	`fal-ai/kling-video/v2.5-turbo/pro/image-to-video`	image-to-video	$0.07/s
`seedance`	`fal-ai/bytedance/seedance/v1.5/pro/image-to-video`	image-to-video	~$0.26/5s
`kling-motion`	`fal-ai/kling-video/v3/pro/motion-control`	motion transfer	$0.126/s
`kling-motion-std`	`fal-ai/kling-video/v3/standard/motion-control`	motion transfer	cheaper
`heygen`	`fal-ai/heygen/avatar4/image-to-video`	speaking avatar	$0.10/s
`seedream-v5`	`fal-ai/bytedance/seedream/v5/lite/edit`	image edit/upscale	$0.035/img
`seedream-v4`	`fal-ai/bytedance/seedream/v4.5/edit`	image edit/transform	$0.04/img
`grok-text`	`xai/grok-imagine-video/text-to-video`	text-to-video	see fal
`grok-image`	`xai/grok-imagine-video/image-to-video`	image-to-video	see fal
`grok-edit`	`xai/grok-imagine-video/edit-video`	video-to-video edit	see fal

Or pass any full fal endpoint as --model fal-ai/....

Credentials

Resolved in order: --api-key > --api-key-file > FAL_KEY > FAL_API_KEY > FAL_AI_KEY > FAL_KEY_ID+FAL_KEY_SECRET

Usage patterns

Image to video (Kling O3, default)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model kling-o3 \
  --prompt "description of desired motion" \
  --duration 10 \
  --no-audio \
  --out ./video.mp4

Image to video (Seedance 1.5 Pro, with audio)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model seedance \
  --prompt "description of motion" \
  --duration 10 \
  --resolution 1080p \
  --out ./video.mp4

Motion transfer

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./person.png \
  --video ./dance_reference.mp4 \
  --model kling-motion \
  --orientation video \
  --prompt "photorealistic person following the motion naturally" \
  --out ./motion_output.mp4

Motion transfer from a source video's frame

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --extract-first-frame-from-video ./identity-source.mp4 \
  --extract-frame-time 00:00:00.500 \
  --video ./dance_reference.mp4 \
  --model kling-motion-std \
  --orientation image \
  --prompt "photorealistic subject, keep identity from the extracted frame, follow the source performance naturally" \
  --out ./motion_from_video.mp4

Motion transfer + Telegram delivery

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --extract-first-frame-from-video ./identity-source.mp4 \
  --extract-frame-time 00:00:00.500 \
  --video ./dance_reference.mp4 \
  --model kling-motion-std \
  --orientation image \
  --prompt "photorealistic subject, keep identity from the extracted frame, follow the source performance naturally" \
  --out ./motion_from_video.mp4 \
  --telegram-target @mychat \
  --telegram-thread-id 42 \
  --telegram-message "render ready"

Grok Imagine text-to-video

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --model grok-text \
  --prompt "cinematic handheld shot of a storm rolling over Kyiv rooftops" \
  --duration 6 \
  --resolution 720p \
  --aspect-ratio 16:9 \
  --out ./grok-text.mp4

Grok Imagine image-to-video

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./start_frame.png \
  --model grok-image \
  --prompt "subtle camera push-in, hair moving in the wind, natural motion" \
  --duration 6 \
  --resolution 720p \
  --aspect-ratio 9:16 \
  --out ./grok-image.mp4

Grok Imagine from a reference video's first frame

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --extract-first-frame-from-video ./reference.mp4 \
  --model grok-image \
  --prompt "continue the scene with smooth realistic motion" \
  --duration 6 \
  --out ./grok-from-frame.mp4

Grok Imagine video edit

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --video ./reference.mp4 \
  --model grok-edit \
  --prompt "convert this clip into a moody neon cyberpunk scene" \
  --resolution 720p \
  --out ./grok-edit.mp4

Speaking avatar (HeyGen)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./face.png \
  --model heygen \
  --prompt "Hello! Welcome to the show" \
  --talking-style expressive \
  --resolution 720p \
  --aspect-ratio 9:16 \
  --out ./speaking.mp4

Image upscale (Seedream v5 to 2K)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./photo.png \
  --model seedream-v5 \
  --prompt "enhance quality, sharpen details, keep original appearance" \
  --out ./upscaled.png

Image transform (Seedream v4.5, replace character)

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./scene.png \
  --model seedream-v4 \
  --prompt "Replace the character with a photorealistic person, keep same pose and background" \
  --image-size 2048x2048 \
  --out ./transformed.png

Start + end frame video

uv run --with fal-client {baseDir}/scripts/fal_video_toolkit.py \
  --image ./start_frame.png \
  --end-image ./end_frame.png \
  --model kling-o3 \
  --prompt "smooth transition between poses" \
  --duration 5 \
  --out ./transition.mp4

Key flags

Flag	Description
`--model`	Model shortcut or full fal endpoint
`--duration`	Video length in seconds
`--orientation`	`image` or `video` (motion control)
`--resolution`	`480p`/`720p`/`1080p` (Seedance, HeyGen)
`--aspect-ratio`	`16:9`, `9:16`, `1:1` etc
`--no-audio`	Disable audio generation
`--no-safety`	Disable client safety checker
`--negative-prompt`	What to exclude (Kling)
`--talking-style`	`stable` or `expressive` (HeyGen)
`--image-size`	Output size for Seedream (e.g. `2048x2048`)
`--end-image`	End frame for transition videos
`--extract-first-frame-from-video`	Extract frame from a reference video and use it as `--image`
`--extract-frame-time`	Timestamp for the extracted frame (default `00:00:00`)
`--telegram-target`	Send the finished media to Telegram after download
`--telegram-thread-id`	Optional Telegram topic/thread id for delivery
`--telegram-reply-to`	Optional Telegram message id to reply to
`--telegram-message`	Optional Telegram caption/message
`--json-out`	Save raw API response for debugging

Guardrails

Start with the best possible image. Models transfer motion better than they fix bad composites.
For motion control, match the source image pose to the first frame of the reference video.
Keep examples generic; pass file paths, Telegram targets, and thread ids explicitly instead of baking personal values into the skill.
Images >5MB must be resized before Kling motion control: ffmpeg -y -i big.png -vf "scale=1080:1920:force_original_aspect_ratio=decrease" small.png
Multiple motion transfers or image-to-video calls are independent — run them in parallel for multi-scene workflows.
Keep prompts short, positive, and direct.
Seedance duration must be a string ("10" not 10), the script handles this automatically.
Grok Imagine video endpoints on fal.ai do not currently expose a documented enable_safety_checker or NSFW toggle. The script warns instead of silently pretending --no-safety works there.
For full model specs and parameters, see references/fal-models.md.

fal.ai Video & Image Toolkit

SKILL.md

fal.ai Video & Image Toolkit

Quick start

Available models

Credentials

Usage patterns

Image to video (Kling O3, default)

Image to video (Seedance 1.5 Pro, with audio)

Motion transfer

Motion transfer from a source video's frame

Motion transfer + Telegram delivery

Grok Imagine text-to-video

Grok Imagine image-to-video

Grok Imagine from a reference video's first frame

Grok Imagine video edit

Speaking avatar (HeyGen)

Image upscale (Seedream v5 to 2K)

Image transform (Seedream v4.5, replace character)

Start + end frame video

Key flags

Guardrails

Preparing the source view

fal.ai Video & Image Toolkit

SKILL.md

fal.ai Video & Image Toolkit

Quick start

Available models

Credentials

Usage patterns

Image to video (Kling O3, default)

Image to video (Seedance 1.5 Pro, with audio)

Motion transfer

Motion transfer from a source video's frame

Motion transfer + Telegram delivery

Grok Imagine text-to-video

Grok Imagine image-to-video

Grok Imagine from a reference video's first frame

Grok Imagine video edit

Speaking avatar (HeyGen)

Image upscale (Seedream v5 to 2K)

Image transform (Seedream v4.5, replace character)

Start + end frame video

Key flags

Guardrails