Nano Banana 2 — Face-Consistent Image Generation (fal.ai)
Generate images that preserve a person's face identity using the Nano Banana 2 edit model on fal.ai.
Quick Start
uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
--face ./my_face.png \
--prompt "A young man sitting at a desk, thoughtful expression, studio lighting" \
--output ./generated/scene.pngCredential Resolution
The script resolves fal.ai credentials in this order:
--api-keyCLI flagFAL_KEYenv var (official primary)FAL_API_KEYenv varFAL_AI_KEYenv var~/.secrets/fal.envfile (lines likeFAL_KEY=xxx)
Parameters
| Flag | Default | Description |
|---|---|---|
--face | required | Face reference image — a portrait, a rotation grid, or any image showing the character |
--prompt | required | Scene/expression description |
--output | required | Output image path (.png) |
--aspect-ratio | 9:16 | 1:1, 16:9, 9:16, 4:3, 3:4 |
--resolution | 2K | 1K, 2K, 4K |
--seed | random | For reproducibility |
--model | fal-ai/nano-banana-2/edit | fal.ai model endpoint. Fallback: fal-ai/nano-banana/edit (v1, Gemini 3 Pro) |
Using Rotation Grids as Face Reference
A rotation grid (multi-angle body shots in one image) is the strongest reference for character identity. It carries face, body proportions, and style all at once. Pass it directly as --face.
# Rotation grid as face reference — strongest identity preservation
uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
--face ./rotation_grid.png \
--prompt "The girl from the reference, wearing a leather jacket, talking to camera in a dark room" \
--output ./scene.pngIf no rotation grid exists yet, generate one first from a face portrait:
uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
--face ./face_portrait.png \
--prompt "8-panel rotation grid showing the character from front, 3/4 left, side left, 3/4 back, back, 3/4 right, side right, 3/4 front. Full body, consistent outfit, neutral pose, studio lighting, white background" \
--aspect-ratio 16:9 \
--output ./rotation_grid.pngThen use the generated grid as --face for all subsequent scene generation.
What does NOT work for in-place replacement
- Passing the source frame (the image you want to replace the character in) as an
image_urldilutes face identity — the model blends both faces and produces a generic result - Instead, describe the source scene's environment in the prompt while using the grid or portrait as
--face
Prompt Tips for Face-Consistent Generation
Strict in-place character replacement (from source video frames)
When replacing a character in an existing video frame:
- Extract the first frame from the source video
- Use face portrait + rotation grid as
image_urls(NOT the source frame) - Describe the source frame's exact environment, lighting, pose, and camera angle in the prompt
- Include specific details: monitor colors, neon lighting direction, room layout, character pose/gesture
Prompt template for scene replacement:
"The [character description] from the reference images, wearing [outfit matching original], [exact pose/gesture from original frame], [expression], [exact environment description from original frame], [lighting description], photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"What works well
- Describe the scene, pose, and expression — the model handles face consistency automatically
- Include background/environment details matching the source frame for best results
- Neutral or thoughtful expressions work best with most face references
- Always end with "photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"
What may fail (422 errors)
- Prompts asking for extreme expressions (big smiles, wide eyes) that differ greatly from the reference face's natural expression
- The model sometimes rejects prompts it can't reconcile with the input face
- If you get
invalid_requesterrors, simplify the expression and retry
Example prompts that work
"The blonde girl with black hair bow from the reference images, wearing a dark leather jacket open over a black top showing cleavage, excitedly talking to camera with both hands gesturing expressively, mouth open mid-speech, standing in a dark futuristic tech room with multiple large glowing green code terminal monitors behind her, a bright horizontal fluorescent light bar above, dark ceiling with cables, purple-blue ambient neon lighting, photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"
"The blonde girl with black hair bow from the reference images, wearing a dark studded leather jacket, enthusiastically raising both fists in an excited celebration gesture, big smile, standing in a dark hacker room surrounded by multiple glowing green code terminal monitors, bright neon pink rectangular light panel glowing behind her head, photorealistic, vertical portrait photo, clean photo no text overlays no UI elements"Example prompts that may fail
"A young man with excited expression, eyes wide open, big smile, eureka moment"
→ Often rejected if reference face is neutral/seriousWorkaround: Use milder expressions like "looking confident", "leaning forward with interest" instead of extreme emotions.
Batch Generation
For multiple scenes, upload the face and grid once and reuse the URLs:
import fal_client
face_url = fal_client.upload_file("./face.png") # upload once
grid_url = fal_client.upload_file("./rotation_grid.png") # upload once
for scene in scenes:
result = fal_client.subscribe("fal-ai/nano-banana-2/edit", arguments={
"prompt": scene["prompt"],
"image_urls": [face_url, grid_url],
"aspect_ratio": "9:16",
"resolution": "2K",
})Cost
~$0.15 per image at 2K resolution.
Fallback: Nano Banana v1
When Nano Banana 2 (fal-ai/nano-banana-2/edit) is down (504 / downstreamserviceunavailable), use v1 which runs on a different backend (Gemini 3 Pro):
uv run --with fal-client {baseDir}/scripts/nano_banana_edit.py \
--face ./face.png \
--prompt "..." \
--model fal-ai/nano-banana/edit \
--output ./scene.pngBoth models accept the same image_urls + prompt API. v1 may produce slightly different style but face consistency is comparable.
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
downstream_service_unavailable | fal.ai GPU pool overloaded | Auto-fallback to v1; or wait 1-2 min, retry |
invalid_request / 422 | Prompt incompatible with face | Simplify expression, remove extreme emotions |
file_too_large (>5MB) | OmniHuman/Kling rejects large images | Resize: ffmpeg -i big.png -vf "scale=1080:-1" small.png |
ModuleNotFoundError: fal_client | Missing dependency | pip install fal-client or use uv run --with fal-client |
Important: Image Size for Downstream Pipeline
OmniHuman and Kling reject images >5MB. When generating at 2K resolution, always resize before passing to lipsync/motion:
# Check and resize if needed
size=$(stat -c%s image.png)
if [ "$size" -gt 5000000 ]; then
ffmpeg -y -i image.png -vf "scale=1080:1920:force_original_aspect_ratio=decrease" image.png
fi