Designers and video editors who can translate the picture in their head into a clear written prompt are pulling ahead in creative AI work, according to Hichame Assi, CEO of stock media marketplace Envato. Most creative training builds visual skills but never teaches precise written description-the only input AI generation models can act on. "Once you've mastered the ability to explain what you need in written language, these models can do remarkable things," Assi said.
Across Envato's community, Assi sees the same shift shaping AI for creatives: the tools are capable, but the bottleneck is knowing what to ask for.
Many designers can picture a shot in full detail but struggle to put it into words. That second skill-describing an image exactly-is what AI tools now run on, and it stays off most creative syllabi. Assi, who took over as Envato's CEO in October 2020 after a decade at HotelsCombined, keeps seeing the same pattern across the platform's millions of creatives. The tools are mostly capable; what holds people back is knowing what they want and how to ask for it. Without that clarity, the same models hand back near-misses while time and budget drain into a result that never quite arrives.
The missing skill in most design programs
Years of training go into the eye and the hand, and almost none goes into writing a spec. "Many of them have been trained to use visual language rather than written language," Assi said. Because the tool only acts on the words it is given, a sharp idea can take a stack of attempts to surface the way it looked in someone's head. That clumsy stretch is where most creatives are right now.
For the general creative pro, it's still a learning journey. Many are tinkering with AI, and it does help for ideation along the way. Like any craft, prompting is learned by doing, and the skill of translating pictures into precise language-what the industry calls prompt engineering-is rarely taught in design school.
Starting from an image instead of a blank page
If writing a perfect prompt is the hard part, one practical move is to lean on it less. Guides to AI video consistently point out that generating video from a text prompt alone is the least predictable way to work-a small change in wording can reshape the entire frame. Beginning from an existing image or clip narrows the job to adding motion, which is more contained and reliable.
For ad teams, the starting point is usually footage pulled from a library. "You look for stock footage to use in a YouTube ad, and it's rarely going to be exactly perfect for what you need. Now with AI, you can riff on it and make small edits to customize it the way you want," Assi said. A few edits to a clip that already exists require only a sentence or two, whereas building the same shot from scratch needs the description to carry everything.
That source clip was also shot by a person, so a human stays in the chain even after AI edits it. Most production guides suggest using each method where it works best: a text prompt for early ideas when speed matters and direction is still loose, and a reference image for the final version when the look and brand need to stay consistent. A common approach is to combine the two-sketching the idea with a prompt and then nailing it down by working from a chosen image.
Where the prompting premium goes next
The pressure on creatives to become perfect prompt writers may ease on its own. The models are getting better at reading a rough prompt and filling in what it leaves out, so the work of spelling out every detail gets a little lighter with each release. Assi sees the next step as tools that take on more of the process, moving a project from one stage to the next automatically. "If you're building a particular type of content, there might be two or three steps in the process," he said, describing tools that carry a creative from a reference through to a finished piece.
The skill premium holds for now, in the stretch before the software gets better at guessing what people mean, but the trend points toward less friction over time.
Why this matters for creatives
A small investment in learning to describe images precisely-or in developing a workflow that pairs a rough prompt with an existing visual reference-can sharply reduce the number of failed generations and re-renders that eat into project budgets. For creatives who build that habit now, the competitive advantage will be strongest while the tools still rely on human wording to close the gap between a mental picture and a usable asset.
Your membership also unlocks: