Nano Banana 2 — Face-Consistent Image Generation (fal.ai)

Generate images that preserve a person's face identity using the Nano Banana 2 edit model on fal.ai.

Quick Start

uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
  --face ./my_face.png \
  --prompt "A young man sitting at a desk, thoughtful expression, studio lighting" \
  --output ./generated/scene.png

Credential Resolution

The script resolves fal.ai credentials in this order:

--api-key CLI flag
FAL_KEY env var (official primary)
FAL_API_KEY env var
FAL_AI_KEY env var
~/.secrets/fal.env file (lines like FAL_KEY=xxx)

Parameters

Flag	Default	Description
`--face`	required	Face reference image — a portrait, a rotation grid, or any image showing the character
`--prompt`	required	Scene/expression description
`--output`	required	Output image path (.png)
`--aspect-ratio`	`9:16`	`1:1`, `16:9`, `9:16`, `4:3`, `3:4`
`--resolution`	`2K`	`1K`, `2K`, `4K`
`--seed`	random	For reproducibility
`--model`	`fal-ai/nano-banana-2/edit`	fal.ai model endpoint. Fallback: `fal-ai/nano-banana/edit` (v1, Gemini 3 Pro)

Using Rotation Grids as Face Reference

A rotation grid (multi-angle body shots in one image) is the strongest reference for character identity. It carries face, body proportions, and style all at once. Pass it directly as --face.

# Rotation grid as face reference — strongest identity preservation
uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
  --face ./rotation_grid.png \
  --prompt "The girl from the reference, wearing a leather jacket, talking to camera in a dark room" \
  --output ./scene.png

If no rotation grid exists yet, generate one first from a face portrait:

uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
  --face ./face_portrait.png \
  --prompt "8-panel rotation grid showing the character from front, 3/4 left, side left, 3/4 back, back, 3/4 right, side right, 3/4 front. Full body, consistent outfit, neutral pose, studio lighting, white background" \
  --aspect-ratio 16:9 \
  --output ./rotation_grid.png

Then use the generated grid as --face for all subsequent scene generation.

What does NOT work for in-place replacement

Passing the source frame (the image you want to replace the character in) as an image_url dilutes face identity — the model blends both faces and produces a generic result
Instead, describe the source scene's environment in the prompt while using the grid or portrait as --face

Prompt Tips for Face-Consistent Generation

Strict in-place character replacement (from source video frames)

When replacing a character in an existing video frame:

Extract the first frame from the source video
Use face portrait + rotation grid as image_urls (NOT the source frame)
Describe the source frame's exact environment, lighting, pose, and camera angle in the prompt
Include specific details: monitor colors, neon lighting direction, room layout, character pose/gesture

Prompt template for scene replacement:

"The [character description] from the reference images, wearing [outfit matching original], [exact pose/gesture from original frame], [expression], [exact environment description from original frame], [lighting description], photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"

What works well

Describe the scene, pose, and expression — the model handles face consistency automatically
Include background/environment details matching the source frame for best results
Neutral or thoughtful expressions work best with most face references
Always end with "photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"

What may fail (422 errors)

Prompts asking for extreme expressions (big smiles, wide eyes) that differ greatly from the reference face's natural expression
The model sometimes rejects prompts it can't reconcile with the input face
If you get invalid_request errors, simplify the expression and retry

Example prompts that work

"The blonde girl with black hair bow from the reference images, wearing a dark leather jacket open over a black top showing cleavage, excitedly talking to camera with both hands gesturing expressively, mouth open mid-speech, standing in a dark futuristic tech room with multiple large glowing green code terminal monitors behind her, a bright horizontal fluorescent light bar above, dark ceiling with cables, purple-blue ambient neon lighting, photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"

"The blonde girl with black hair bow from the reference images, wearing a dark studded leather jacket, enthusiastically raising both fists in an excited celebration gesture, big smile, standing in a dark hacker room surrounded by multiple glowing green code terminal monitors, bright neon pink rectangular light panel glowing behind her head, photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"

Example prompts that may fail

"A young man with excited expression, eyes wide open, big smile, eureka moment"
→ Often rejected if reference face is neutral/serious

Workaround: Use milder expressions like "looking confident", "leaning forward with interest" instead of extreme emotions.

Batch Generation

For multiple scenes, upload the face and grid once and reuse the URLs:

import fal_client

face_url = fal_client.upload_file("./face.png")  # upload once
grid_url = fal_client.upload_file("./rotation_grid.png")  # upload once

for scene in scenes:
    result = fal_client.subscribe("fal-ai/nano-banana-2/edit", arguments={
        "prompt": scene["prompt"],
        "image_urls": [face_url, grid_url],
        "aspect_ratio": "9:16",
        "resolution": "2K",
    })

Cost

~$0.15 per image at 2K resolution.

Fallback: Nano Banana v1

When Nano Banana 2 (fal-ai/nano-banana-2/edit) is down (504 / downstreamserviceunavailable), use v1 which runs on a different backend (Gemini 3 Pro):

uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
  --face ./face.png \
  --prompt "..." \
  --model fal-ai/nano-banana/edit \
  --output ./scene.png

Both models accept the same image_urls + prompt API. v1 may produce slightly different style but face consistency is comparable.

Troubleshooting

Error	Cause	Fix
`downstream_service_unavailable`	fal.ai GPU pool overloaded	Auto-fallback to v1; or wait 1-2 min, retry
`invalid_request` / 422	Prompt incompatible with face	Simplify expression, remove extreme emotions
`file_too_large` (>5MB)	OmniHuman/Kling rejects large images	Resize: `ffmpeg -i big.png -vf "scale=1080:-1" small.png`
`ModuleNotFoundError: fal_client`	Missing dependency	`pip install fal-client` or use `uv run --with fal-client`

Important: Image Size for Downstream Pipeline

OmniHuman and Kling reject images >5MB. When generating at 2K resolution, always resize before passing to lipsync/motion:

# Check and resize if needed
size=$(stat -c%s image.png)
if [ "$size" -gt 5000000 ]; then
  ffmpeg -y -i image.png -vf "scale=1080:1920:force_original_aspect_ratio=decrease" image.png
fi

Preparing the source view

Nano Banana Fal

SKILL.md