Effective prompts for AI video ads follow a predictable structure: shot type, subject action, environment, lighting, camera movement, and mood. The specific syntax and weight you give each element changes per model, and getting this wrong is the difference between a usable hero asset and an unusable blur.
This guide covers the five models we use daily at Adsome for DTC ad production: Kling 3.0, Runway Gen-4, Veo 3, Sora 2, and Pika 2.2.
What prompt structure works across all AI video models?
Every model responds to a core framework. Think of it as six slots you fill in order:
- Shot type (close-up, medium, wide, overhead)
- Subject and action ("a woman applies serum to her cheek")
- Environment ("minimalist bathroom, marble countertop")
- Lighting ("soft morning light from a window camera-left")
- Camera movement ("slow dolly in" or "static locked tripod")
- Mood or grade ("warm tones, film grain" or "clean, bright, editorial")
Putting these in order matters because most models parse the opening tokens with higher weight. Front-load the shot type and subject action. Push stylistic qualifiers to the end.
A real example for a skincare ad:
Close-up of a woman's hand lifting a glass dropper from a dark amber serum bottle, golden liquid catching soft window light, shallow depth of field, slow push-in, warm editorial tone
That prompt works across Kling 3.0, Gen-4, and Veo 3 with minor tuning per model.
How to adapt prompts for each model
Kling 3.0
Kling 3.0 handles physics and fluid motion better than most competitors, making it strong for product interaction shots (pouring, spraying, applying). Use its Master tier for hero shots where motion fidelity matters.
- Specify exact hand positions and object interactions. Kling responds well to granular physical descriptions like "fingers grip the bottle cap and twist counter-clockwise."
- Camera movement descriptions should be specific about speed. "Very slow dolly in over 5 seconds" outperforms a vague "dolly in."
- Avoid stacking more than two camera movements. Kling 3.0 will attempt both and neither will look intentional.
Runway Gen-4
Gen-4 and Gen-4 Turbo excel at stylistic consistency and respect reference images well. For advertising, this means you can feed a brand's existing look and get outputs that match.
- Gen-4 responds strongly to cinematic references. Adding "anamorphic lens, 35mm" or "shot on ARRI" influences the output more reliably than in other models.
- Keep prompts shorter than Kling prompts. Gen-4 tends to ignore details past roughly 60-70 words, prioritizing early tokens aggressively.
- Gen-4 Turbo is faster but less precise on complex camera moves. Use standard Gen-4 for anything involving a rack focus or combined pan-and-dolly.
Veo 3
Veo 3 generates video with native audio, which changes the prompting approach for ads where ambient sound matters (ASMR product reveals, food and beverage).
- Include audio cues in the prompt: "the sound of liquid pouring into a glass, ice clinking" produces synchronized audio without post-production.
- Veo 3 interprets spatial descriptions well. "Camera-left," "background," and "foreground" are parsed accurately.
- For dialogue-driven UGC-style ads, write the spoken line directly in the prompt. Veo 3 can generate lip-synced speech.
Sora 2
Sora 2 produces longer coherent clips and handles scene transitions within a single generation.
- You can describe a two-part scene in one prompt: "starts with a wide shot of the kitchen, then cuts to a close-up of hands opening the product box." Sora 2 will generate an internal cut.
- Lighting descriptions need to be explicit. Sora 2 defaults to flat, even lighting when unspecified, which kills the editorial feel most DTC brands want.
- Add texture words: "film grain," "subtle lens flare," "natural skin texture." Sora 2 tends toward a clean digital look that reads as stock footage without these.
Pika 2.2
Pika 2.2 is best suited for short motion graphics-style clips and stylized product shots rather than realistic hero video.
- Pika responds to style keywords more than physical descriptions. "Stop-motion," "claymation," or "flat illustration" yield strong results.
- For product ads, use image-to-video mode with a clean packshot as input and prompt the motion: "bottle rotates 45 degrees, label facing camera."
- Keep prompts under 40 words. Pika 2.2 loses coherence with long descriptions.
Common prompting mistakes that waste render credits
Three patterns consistently produce unusable output across all models:
- Contradictory directions. "Fast-paced and calm" or "cinematic and raw UGC" confuses the model. Pick one tone.
- Unspecified hands and faces. If your shot includes hands touching a product, describe the grip. If a face is in frame, specify the expression. Leaving these ambiguous causes distortion artifacts in every current model.
- Overloading a single generation. Requesting a full 15-second ad narrative in one prompt degrades quality. Generate 3-5 second clips with focused prompts and edit them together.
