Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Generates and iterates ad copy at scale for Google Ads, Meta, LinkedIn, TikTok, and Twitter/X with performance data analysis.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/generative-tools.md
1# Generative AI Tools for Ad Creative23Reference for using AI image generators, video generators, and code-based video tools to produce ad visuals at scale.45---67## When to Use Generative Tools89| Need | Tool Category | Best Fit |10|------|---------------|----------|11| Static ad images (banners, social) | Image generation | Nano Banana Pro, Flux, Ideogram |12| Ad images with text overlays | Image generation (text-capable) | Ideogram, Nano Banana Pro |13| Short video ads (6-30 sec) | Video generation | Veo, Kling, Runway, Sora, Seedance |14| Video ads with voiceover | Video gen + voice | Veo/Sora (native), or Runway + ElevenLabs |15| Voiceover tracks for ads | Voice generation | ElevenLabs, OpenAI TTS, Cartesia |16| Multi-language ad versions | Voice generation | ElevenLabs, PlayHT |17| Brand voice cloning | Voice generation | ElevenLabs, Resemble AI |18| Product mockups and variations | Image generation + references | Flux (multi-image reference) |19| Templated video ads at scale | Code-based video | Remotion |20| Personalized video (name, data) | Code-based video | Remotion |21| Brand-consistent variations | Image gen + style refs | Flux, Ideogram, Nano Banana Pro |2223---2425## Image Generation2627### Nano Banana Pro (Gemini)2829Google DeepMind's image generation model, available through the Gemini API.3031**Best for:** High-quality ad images, product visuals, text rendering32**API:** Gemini API (Google AI Studio, Vertex AI)33**Pricing:** ~$0.04/image (Gemini 2.5 Flash Image), ~$0.24/4K image (Nano Banana Pro)3435**Strengths:**36- Strong text rendering in images (logos, headlines)37- Native image editing (modify existing images with prompts)38- Available through the same Gemini API used for text generation39- Supports both generation and editing in one model4041**Ad creative use cases:**42- Generate social media ad images from text descriptions43- Create product mockup variations44- Edit existing ad images (swap backgrounds, change colors)45- Generate images with headline text baked in4647**API example:**48```bash49# Using the Gemini API for image generation50curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \51-H "Content-Type: application/json" \52-H "x-goog-api-key: $GEMINI_API_KEY" \53-d '{54"contents": [{"parts": [{"text": "Create a clean, modern social media ad image for a project management tool. Show a laptop with a kanban board interface. Bright, professional, 16:9 ratio."}]}],55"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}56}'57```5859**Docs:** [Gemini Image Generation](https://ai.google.dev/gemini-api/docs/image-generation)6061---6263### Flux (Black Forest Labs)6465Open-weight image generation models with API access through Replicate and BFL's native API.6667**Best for:** Photorealistic images, brand-consistent variations, multi-reference generation68**API:** Replicate, BFL API, fal.ai69**Pricing:** ~$0.01-0.06/image depending on model and resolution7071**Model variants:**72| Model | Speed | Quality | Cost | Best For |73|-------|-------|---------|------|----------|74| Flux 2 Pro | ~6 sec | Highest | $0.015/MP | Final production assets |75| Flux 2 Flex | ~22 sec | High + editing | $0.06/MP | Iterative editing |76| Flux 2 Dev | ~2.5 sec | Good | $0.012/MP | Rapid prototyping |77| Flux 2 Klein | Fastest | Good | Lowest | High-volume batch generation |7879**Strengths:**80- Multi-image reference (up to 8 images) for consistent identity across ads81- Product consistency — same product in different contexts82- Style transfer from reference images83- Open-weight Dev model for self-hosting8485**Ad creative use cases:**86- Generate 50+ ad variations with consistent product/person identity87- Create product-in-context images (your SaaS on different devices)88- Style-match to existing brand assets using reference images89- Rapid A/B test image variations9091**Docs:** [Replicate Flux](https://replicate.com/black-forest-labs/flux-2-pro), [BFL API](https://docs.bfl.ml/)9293---9495### Ideogram9697Specialized in typography and text rendering within images.9899**Best for:** Ad banners with text, branded graphics, social ad images with headlines100**API:** Ideogram API, Runware101**Pricing:** ~$0.06/image (API), ~$0.009/image (subscription)102103**Strengths:**104- Best-in-class text rendering (~90% accuracy vs ~30% for most tools)105- Style reference system (upload up to 3 reference images)106- 4.3 billion style presets for consistent brand aesthetics107- Strong at logos and branded typography108109**Ad creative use cases:**110- Generate ad banners with headline text directly in the image111- Create social media graphics with branded text overlays112- Produce multiple design variations with consistent typography113- Generate promotional materials without needing a designer for each iteration114115**Docs:** [Ideogram API](https://developer.ideogram.ai/), [Ideogram](https://ideogram.ai/)116117---118119### Other Image Tools120121| Tool | Best For | API Status | Notes |122|------|----------|------------|-------|123| **DALL-E 3** (OpenAI) | General image generation | Official API | Integrated with ChatGPT, good text rendering |124| **Midjourney** | Artistic, high-aesthetic images | No official public API | Discord-based; unofficial APIs exist but risk bans |125| **Stable Diffusion** | Self-hosted, customizable | Open source | Best for teams with GPU infrastructure |126127---128129## Video Generation130131### Google Veo132133Google DeepMind's video generation model, available through the Gemini API and Vertex AI.134135**Best for:** High-quality video ads with native audio, vertical video for social136**API:** Gemini API, Vertex AI137**Pricing:** ~$0.15/sec (Veo 3.1 Fast), ~$0.40/sec (Veo 3.1 Standard)138139**Capabilities:**140- Up to 60 seconds at 1080p141- Native audio generation (dialogue, sound effects, ambient)142- Vertical 9:16 output for Stories/Reels/Shorts143- Upscale to 4K144- Text-to-video and image-to-video145146**Ad creative use cases:**147- Generate short video ads (15-30 sec) from text descriptions148- Create vertical video ads for TikTok, Reels, Shorts149- Produce product demos with voiceover150- Generate multiple video variations from the same prompt with different styles151152**Docs:** [Veo on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/video/overview)153154---155156### Kling (Kuaishou)157158Video generation with simultaneous audio-visual generation and camera controls.159160**Best for:** Cinematic video ads, longer-form content, audio-synced video161**API:** Kling API, PiAPI, fal.ai162**Pricing:** ~$0.09/sec (via fal.ai third-party)163164**Capabilities:**165- Up to 3 minutes at 1080p/30-48fps166- Simultaneous audio-visual generation (Kling 2.6)167- Text-to-video and image-to-video168- Motion and camera controls169170**Ad creative use cases:**171- Longer product explainer videos172- Cinematic brand videos with synchronized audio173- Animate product images into video ads174175**Docs:** [Kling AI Developer](https://klingai.com/global/dev/model/video)176177---178179### Runway180181Video generation and editing platform with strong controllability.182183**Best for:** Controlled video generation, style-consistent content, editing existing footage184**API:** Runway Developer Portal185186**Capabilities:**187- Gen-4: Character/scene consistency across shots188- Motion brush and camera controls189- Image-to-video with reference images190- Video-to-video style transfer191192**Ad creative use cases:**193- Generate video ads with consistent characters/products across scenes194- Style-transfer existing footage to match brand aesthetics195- Extend or remix existing video content196197**Docs:** [Runway API](https://docs.dev.runwayml.com/)198199---200201### Sora 2 (OpenAI)202203OpenAI's video generation model with synchronized audio.204205**Best for:** High-fidelity video with dialogue and sound206**API:** OpenAI API207**Pricing:** Free tier available; Pro from $0.10-0.50/sec depending on resolution208209**Capabilities:**210- Up to 60 seconds with synchronized audio211- Dialogue, sound effects, and ambient audio212- sora-2 (fast) and sora-2-pro (quality) variants213- Text-to-video and image-to-video214215**Ad creative use cases:**216- Video testimonials and talking-head style ads217- Product demo videos with narration218- Narrative brand videos219220**Docs:** [OpenAI Video Generation](https://platform.openai.com/docs/guides/video-generation)221222---223224### Seedance 2.0 (ByteDance)225226ByteDance's video generation model with simultaneous audio-visual generation and multimodal inputs.227228**Best for:** Fast, affordable video ads with native audio, multimodal reference inputs229**API:** BytePlus (official), Replicate, WaveSpeedAI, fal.ai (third-party); OpenAI-compatible API format230**Pricing:** ~$0.10-0.80/min depending on resolution (estimated 10-100x cheaper than Sora 2 per clip)231232**Capabilities:**233- Up to 20 seconds at up to 2K resolution234- Simultaneous audio-visual generation (Dual-Branch Diffusion Transformer)235- Text-to-video and image-to-video236- Up to 12 reference files for multimodal input237- OpenAI-compatible API structure238239**Ad creative use cases:**240- High-volume short video ad production at low cost241- Video ads with synchronized voiceover and sound effects in one pass242- Multi-reference generation (feed product images, brand assets, style references)243- Rapid iteration on video ad concepts244245**Docs:** [Seedance](https://seed.bytedance.com/en/seedance2_0)246247---248249### Higgsfield250251Full-stack video creation platform with cinematic camera controls.252253**Best for:** Social video ads, cinematic style, mobile-first content254**Platform:** [higgsfield.ai](https://higgsfield.ai/)255256**Capabilities:**257- 50+ professional camera movements (zooms, pans, FPV drone shots)258- Image-to-video animation259- Built-in editing, transitions, and keyframing260- All-in-one workflow: image gen, animation, editing261262**Ad creative use cases:**263- Social media video ads with cinematic feel264- Animate product images into dynamic video265- Create multiple video variations with different camera styles266- Quick-turn video content for social campaigns267268---269270### Video Tool Comparison271272| Tool | Max Length | Audio | Resolution | API | Best For |273|------|-----------|-------|------------|-----|----------|274| **Veo 3.1** | 60 sec | Native | 1080p/4K | Gemini | Vertical social video |275| **Kling 2.6** | 3 min | Native | 1080p | Third-party | Longer cinematic |276| **Runway Gen-4** | 10 sec | No | 1080p | Official | Controlled, consistent |277| **Sora 2** | 60 sec | Native | 1080p | Official | Dialogue-heavy |278| **Seedance 2.0** | 20 sec | Native | 2K | Official + third-party | Affordable high-volume |279| **Higgsfield** | Varies | Yes | 1080p | Web-based | Social, mobile-first |280281---282283## Voice & Audio Generation284285For layering realistic voiceovers onto video ads, adding narration to product demos, or generating audio for Remotion-rendered videos. These tools turn ad scripts into natural-sounding voice tracks.286287### When to Use Voice Tools288289Many video generators (Veo, Kling, Sora, Seedance) now include native audio. Use standalone voice tools when you need:290291- **Voiceover on silent video** — Runway Gen-4 and Remotion produce silent output292- **Brand voice consistency** — Clone a specific voice for all ads293- **Multi-language versions** — Same ad script in 20+ languages294- **Script iteration** — Re-record voiceover without reshooting video295- **Precise control** — Exact timing, emotion, and pacing296297---298299### ElevenLabs300301The market leader in realistic voice generation and voice cloning.302303**Best for:** Most natural-sounding voiceovers, brand voice cloning, multilingual304**API:** REST API with streaming support305**Pricing:** ~$0.12-0.30 per 1,000 characters depending on plan; starts at $5/month306307**Capabilities:**308- 29+ languages with natural accent and intonation309- Voice cloning from short audio clips (instant) or longer recordings (professional)310- Emotion and style control311- Streaming for real-time generation312- Voice library with hundreds of pre-built voices313314**Ad creative use cases:**315- Generate voiceover tracks for video ads316- Clone your brand spokesperson's voice for all ad variations317- Produce the same ad in 10+ languages from one script318- A/B test different voice styles (authoritative vs. friendly vs. urgent)319320**API example:**321```bash322curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/{voice_id}" \323-H "xi-api-key: $ELEVENLABS_API_KEY" \324-H "Content-Type: application/json" \325-d '{326"text": "Stop wasting hours on manual reporting. Try DataFlow free for 14 days.",327"model_id": "eleven_multilingual_v2",328"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}329}' --output voiceover.mp3330```331332**Docs:** [ElevenLabs API](https://elevenlabs.io/docs/api-reference/text-to-speech)333334---335336### OpenAI TTS337338Simple, affordable text-to-speech built into the OpenAI API.339340**Best for:** Quick voiceovers, cost-effective at scale, simple integration341**API:** OpenAI API (same SDK as GPT/DALL-E)342**Pricing:** $15/million chars (standard), $30/million chars (HD); ~$0.015/min with gpt-4o-mini-tts343344**Capabilities:**345- 13 built-in voices (no custom cloning)346- Multiple languages347- Real-time streaming348- HD quality option349- Simple API — same SDK you already use for GPT350351**Ad creative use cases:**352- Fast, cheap voiceover for draft/test ad versions353- High-volume narration at low cost354- Prototype ad audio before investing in premium voice355356**Docs:** [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech)357358---359360### Cartesia Sonic361362Ultra-low latency voice generation built for real-time applications.363364**Best for:** Real-time voice, lowest latency, emotional expressiveness365**API:** REST + WebSocket streaming366**Pricing:** Starts at $5/month; pay-as-you-go from $0.03/min367368**Capabilities:**369- 40ms time-to-first-audio (fastest in class)370- 15+ languages371- Nonverbal expressiveness: laughter, breathing, emotional inflections372- Sonic Turbo for even lower latency373- Streaming API for real-time generation374375**Ad creative use cases:**376- Real-time ad preview during creative iteration377- Interactive demo videos with dynamic narration378- Ads requiring natural laughter, sighs, or emotional reactions379380**Docs:** [Cartesia Sonic](https://docs.cartesia.ai/build-with-cartesia/tts-models/latest)381382---383384### Voicebox (Open Source)385386Free, local-first voice synthesis studio powered by Qwen3-TTS. The open-source alternative to ElevenLabs.387388**Best for:** Free voice cloning, local/private generation, zero-cost batch production389**API:** Local REST API at `http://localhost:8000`390**Pricing:** Free (MIT license). Runs entirely on your machine.391**Stack:** Tauri (Rust) + React + FastAPI (Python)392393**Capabilities:**394- Voice cloning from short audio samples via Qwen3-TTS395- Multi-language support (English, Chinese, more planned)396- Multi-track timeline editor for composing conversations397- 4-5x faster inference on Apple Silicon via MLX Metal acceleration398- Local REST API for programmatic generation399- No cloud dependency — all processing on-device400401**Ad creative use cases:**402- Free voice cloning for brand spokesperson across all ad variations403- Batch generate voiceovers without per-character costs404- Private/local generation when ad content is sensitive or pre-launch405- Prototype voice variations before committing to a paid service406407**API example:**408```bash409curl -X POST http://localhost:8000/generate \410-H "Content-Type: application/json" \411-d '{"text": "Stop wasting hours on manual reporting.", "profile_id": "abc123", "language": "en"}'412```413414**Install:** Desktop apps for macOS and Windows at [voicebox.sh](https://voicebox.sh), or build from source:415```bash416git clone https://github.com/jamiepine/voicebox.git417cd voicebox && make setup && make dev418```419420**Docs:** [GitHub](https://github.com/jamiepine/voicebox)421422---423424### Other Voice Tools425426| Tool | Best For | Differentiator | API |427|------|----------|---------------|-----|428| **PlayHT** | Large voice library, low latency | 900+ voices, <300ms latency, ultra-realistic | [play.ht](https://play.ht/) |429| **Resemble AI** | Enterprise voice cloning | On-premise deployment, real-time speech-to-speech | [resemble.ai](https://www.resemble.ai/) |430| **WellSaid Labs** | Ethical, commercial-safe voices | Voices from compensated actors, safe for commercial use | [wellsaid.io](https://www.wellsaid.io/) |431| **Fish Audio** | Budget-friendly, emotion control | ~50-70% cheaper than ElevenLabs, emotion tags | [fish.audio](https://fish.audio/) |432| **Murf AI** | Non-technical teams | Browser-based studio, 200+ voices | [murf.ai](https://murf.ai/) |433| **Google Cloud TTS** | Google ecosystem, scale | 220+ voices, 40+ languages, enterprise SLAs | [Google TTS](https://cloud.google.com/text-to-speech) |434| **Amazon Polly** | AWS ecosystem, cost | Neural voices, SSML control, cheap at volume | [Amazon Polly](https://aws.amazon.com/polly/) |435436---437438### Voice Tool Comparison439440| Tool | Quality | Cloning | Languages | Latency | Price/1K chars |441|------|---------|---------|-----------|---------|----------------|442| **ElevenLabs** | Best | Yes (instant + pro) | 29+ | ~200ms | $0.12-0.30 |443| **OpenAI TTS** | Good | No | 13+ | ~300ms | $0.015-0.030 |444| **Cartesia Sonic** | Very good | No | 15+ | ~40ms | ~$0.03/min |445| **PlayHT** | Very good | Yes | 140+ | <300ms | ~$0.10-0.20 |446| **Fish Audio** | Good | Yes | 13+ | ~200ms | ~$0.05-0.10 |447| **WellSaid** | Very good | No (actor voices) | English | ~300ms | Custom pricing |448| **Voicebox** | Good | Yes (local) | 2+ | Local | Free (open source) |449450### Choosing a Voice Tool451452```453Need voiceover for ads?454├── Need to clone a specific brand voice?455│ ├── Best quality → ElevenLabs456│ ├── Enterprise/on-premise → Resemble AI457│ └── Budget-friendly → Fish Audio, PlayHT458├── Need multilingual (same ad, many languages)?459│ ├── Most languages → PlayHT (140+)460│ └── Best quality → ElevenLabs (29+)461├── Need free / open source / local?462│ └── Voicebox (MIT, runs on your machine)463├── Need cheap, fast, good-enough?464│ └── OpenAI TTS ($0.015/min)465├── Need commercially-safe licensing?466│ └── WellSaid Labs (actor-compensated voices)467└── Need real-time/interactive?468└── Cartesia Sonic (40ms TTFA)469```470471### Workflow: Voice + Video472473```4741. Write ad script (use ad-creative skill for copy)4752. Generate voiceover with ElevenLabs/OpenAI TTS4763. Generate or render video:477a. Silent video from Runway/Remotion → layer voice track478b. Or use Veo/Sora/Seedance with native audio (skip separate VO)4794. Combine with ffmpeg if layering separately:480ffmpeg -i video.mp4 -i voiceover.mp3 -c:v copy -c:a aac output.mp44815. Generate variations (different scripts, voices, or languages)482```483484---485486## Code-Based Video: Remotion487488For templated, data-driven video ads at scale, Remotion is the best option. Unlike AI video generators that produce unique video from prompts, Remotion uses React code to render deterministic, brand-perfect video from templates and data.489490**Best for:** Templated ad variations, personalized video, brand-consistent production491**Stack:** React + TypeScript492**Pricing:** Free for individuals/small teams; commercial license required for 4+ employees493**Docs:** [remotion.dev](https://www.remotion.dev/)494495### Why Remotion for Ads496497| AI Video Generators | Remotion |498|---------------------|----------|499| Unique output each time | Deterministic, pixel-perfect |500| Prompt-based, less control | Full code control over every frame |501| Hard to match brand exactly | Exact brand colors, fonts, spacing |502| One-at-a-time generation | Batch render hundreds from data |503| No dynamic data insertion | Personalize with names, prices, stats |504505### Ad Creative Use Cases506507**1. Dynamic product ads**508Feed a JSON array of products and render a unique video ad for each:509```tsx510// Simplified Remotion component for product ads511export const ProductAd: React.FC<{512productName: string;513price: string;514imageUrl: string;515tagline: string;516}> = ({productName, price, imageUrl, tagline}) => {517return (518<AbsoluteFill style={{backgroundColor: '#fff'}}>519<Img src={imageUrl} style={{width: 400, height: 400}} />520<h1>{productName}</h1>521<p>{tagline}</p>522<div className="price">{price}</div>523<div className="cta">Shop Now</div>524</AbsoluteFill>525);526};527```528529**2. A/B test video variations**530Render the same template with different headlines, CTAs, or color schemes:531```tsx532const variations = [533{headline: "Save 50% Today", cta: "Get the Deal", theme: "urgent"},534{headline: "Join 10K+ Teams", cta: "Start Free", theme: "social-proof"},535{headline: "Built for Speed", cta: "Try It Now", theme: "benefit"},536];537// Render all variations programmatically538```539540**3. Personalized outreach videos**541Generate videos addressing prospects by name for cold outreach or sales.542543**4. Social ad batch production**544Render the same content across different aspect ratios:545- 1:1 for feed546- 9:16 for Stories/Reels547- 16:9 for YouTube548549### Remotion Workflow for Ad Creative550551```5521. Design template in React (or use AI to generate the component)5532. Define data schema (products, headlines, CTAs, images)5543. Feed data array into template5554. Batch render all variations5565. Upload to ad platform557```558559### Getting Started560561```bash562# Create a new Remotion project563npx create-video@latest564565# Render a single video566npx remotion render src/index.ts MyComposition out/video.mp4567568# Batch render from data569npx remotion render src/index.ts MyComposition --props='{"data": [...]}'570```571572---573574## Choosing the Right Tool575576### Decision Tree577578```579Need video ads?580├── Templated, data-driven (same structure, different data)581│ └── Use Remotion582├── Unique creative from prompts (exploratory)583│ ├── Need dialogue/voiceover? → Sora 2, Veo 3.1, Kling 2.6, Seedance 2.0584│ ├── Need consistency across scenes? → Runway Gen-4585│ ├── Need vertical social video? → Veo 3.1 (native 9:16)586│ ├── Need high volume at low cost? → Seedance 2.0587│ └── Need cinematic camera work? → Higgsfield, Kling588└── Both → Use AI gen for hero creative, Remotion for variations589590Need image ads?591├── Need text/headlines in image? → Ideogram592├── Need product consistency across variations? → Flux (multi-ref)593├── Need quick iterations on existing images? → Nano Banana Pro594├── Need highest visual quality? → Flux Pro, Midjourney595└── Need high volume at low cost? → Flux Klein, Nano Banana596```597598### Cost Comparison for 100 Ad Variations599600| Approach | Tool | Approximate Cost |601|----------|------|-----------------|602| 100 static images | Nano Banana Pro | ~$4-24 |603| 100 static images | Flux Dev | ~$1-2 |604| 100 static images | Ideogram API | ~$6 |605| 100 × 15-sec videos | Veo 3.1 Fast | ~$225 |606| 100 × 15-sec videos | Remotion (templated) | ~$0 (self-hosted render) |607| 10 hero videos + 90 templated | Veo + Remotion | ~$22 + render time |608609### Recommended Workflow for Scaled Ad Production6106111. **Generate hero creative** with AI (Nano Banana, Flux, Veo) — high-quality, exploratory6122. **Build templates** in Remotion based on winning creative patterns6133. **Batch produce variations** with Remotion using data (products, headlines, CTAs)6144. **Iterate** — use AI tools for new angles, Remotion for scale615616This hybrid approach gives you the creative exploration of AI generators and the consistency and scale of code-based rendering.617618---619620## Platform-Specific Image Specs621622When generating images for ads, request the correct dimensions:623624| Platform | Placement | Aspect Ratio | Recommended Size |625|----------|-----------|-------------|-----------------|626| Meta Feed | Single image | 1:1 | 1080x1080 |627| Meta Stories/Reels | Vertical | 9:16 | 1080x1920 |628| Meta Carousel | Square | 1:1 | 1080x1080 |629| Google Display | Landscape | 1.91:1 | 1200x628 |630| Google Display | Square | 1:1 | 1200x1200 |631| LinkedIn Feed | Landscape | 1.91:1 | 1200x627 |632| LinkedIn Feed | Square | 1:1 | 1200x1200 |633| TikTok Feed | Vertical | 9:16 | 1080x1920 |634| Twitter/X Feed | Landscape | 16:9 | 1200x675 |635| Twitter/X Card | Landscape | 1.91:1 | 800x418 |636637Include these dimensions in your generation prompts to avoid needing to crop or resize.638