Canva prototypes faster with GPT-5

Featured

In a short clip released by OpenAI, I—CJ Jones, Canva’s Global Head of Design, GenAI—shared how GPT-5 is accelerating our ability to prototype, iterate, and ship creative features at scale. This announcement is more than a technical update; it marks a practical and measurable shift in how we translate natural language intent into functional design features for millions of users worldwide.

As someone who leads design and generative AI efforts at Canva, I’m often asked how emerging models change day-to-day workflows. In this article I’ll report on what we’ve learned using GPT-5, the concrete results we saw in early experiments, and how these advances feed directly into new product experiences like Magic Formulas, polls, and quizzes. I’ll also explain what this means for building across 100+ languages, and how teams can think about leveraging similar capabilities responsibly and effectively.

📰 Executive summary

Canva has used AI from the beginning to democratize creativity. With GPT-5, we've observed a significant lift in our ability to complete complex, multi-step tasks. In early experiments we noticed a 44% improvement in successful completions for those kinds of tasks. Practically, that meant we could convert a natural-language sentence into a sophisticated formula in Magic Formulas, and prototype interactive experiences like polls and quizzes more rapidly. These improvements are enabling us to iterate faster for diverse communities across 100+ languages.

“GPT five definitely represents a step change in what we're able to do.” — CJ Jones

🧭 Why this matters: democratizing creativity at scale

I’ve spent years focused on removing friction from creative work so more people can bring ideas to life. Canva’s mission has always been about making design accessible to everyone, not just designers. That mission demands tools that are intuitive, fast, and forgiving. Models like GPT-5 help us make those tools smarter and more capable.

When I say “democratize creativity,” I mean tools that let a classroom teacher create a visually engaging worksheet without wrestling with a design app, or a small business owner produce professional social media assets in minutes. The more capable our AI becomes, the less users need to be design experts to produce high-quality outcomes.

GPT-5 is especially impactful because it improves performance on multi-step tasks—those workflows where a user intent needs to be translated into a precise sequence of operations. That’s where the real product value is: turning a vague idea into a repeatable, reliable result.

📈 The numbers that caught our attention

Numbers matter when you’re making product decisions. In our early experiments with GPT-5 we focused on measurable outcomes that map directly to user satisfaction and product reliability. One statistic stood out: a 44% improvement in successful completions of complex, multi-step tasks. That’s not a small uptick—it’s a game-changer for the kinds of features we prototype and ship.

To be clear, “successful completion” in our tests meant the model produced outputs that fulfilled the user’s intent in a way the product could reliably act upon. For Magic Formulas, for example, successful completion meant producing a valid, working formula from a user's natural-language prompt. Reaching a 44% improvement in that measure meant fewer fallbacks, fewer clarifying prompts, and a smoother user experience overall.

We don’t celebrate raw model metrics for their own sake; we care about how those improvements translate to product velocity and user satisfaction. The 44% lift directly reduced iteration time for engineers and designers, and reduced the cognitive load for end users who now receive better results from a single prompt.

🧪 What we prototyped: Magic Formulas, polls, and quizzes

We experimented with a handful of new experiences that demonstrate GPT-5’s strengths in handling complexity and nuance. Two examples I highlighted were Magic Formulas and interactive content like polls and quizzes. Both are representative of a broader class of features that demand multi-step reasoning and precise outputs.

Magic Formulas: turning language into precise logic

Magic Formulas is a product aimed at making advanced spreadsheet-like formulas accessible to people who don’t speak the syntax of formulas. A typical user might write, “Show me the percentage change between last quarter and this quarter for rows with sales above $10,000” and expect a correct formula to appear. That’s a non-trivial mapping from natural language to formal logic.

With GPT-5, we were able to translate natural language into complex formulas that previously would have required specialist knowledge. This is the kind of interaction that converts intent into action: from a user’s sentence to a working formula that the product can execute.

What made this possible was the model’s improved ability to keep track of multiple steps and constraints while producing syntactically correct and semantically meaningful formulas. The 44% improvement in successful complex-task completion was particularly visible here—users got working formulas more often, reducing the need for manual corrections or follow-up clarification prompts.

Polls and quizzes: rapid prototyping of interactive experiences

Another area where GPT-5 shone was enabling rapid prototyping for dynamic content—polls, quizzes, and other interactive elements. These experiences require more than just a single correct output; they require structured outputs, sensible default options, and contextual diversity when generating choices or suggested phrasing.

Because GPT-5 better handles multi-step generation, we could prototype poll and quiz flows faster. For instance, the model could propose a set of poll options that are balanced and relevant to the user’s prompt, or create quiz questions with plausible distractors—critical for building engaging learning tools.

That speed and quality of prototyping shortens the feedback loop between designers, engineers, and users. Instead of weeks to test an idea, teams could iterate in days or hours, testing real interactions with representative content generated by the model.

🌍 Building for 100+ languages

One of Canva’s defining characteristics is our global reach. We serve users in more than 100 languages, and product decisions must respect diverse linguistic and cultural contexts. GPT-5's improved multilingual capabilities made it easier and faster for us to extend new features across markets.

Previously, supporting a new language for an AI-driven feature often required significant engineering effort to ensure quality and consistency. With GPT-5, the model’s stronger performance across languages meant we spent less time on language-specific engineering and more time on product experience: UX copy, local content, and cultural nuance.

There are still cases where language-specific data or fine-tuning helps. But the baseline multilingual competence of GPT-5 reduced the barrier to prototyping and delivering features to non-English-speaking users. That’s crucial for equitable access to Canva’s tools worldwide.

🔎 What “step change” looks like in practice

When I describe GPT-5 as a “step change,” I mean that its improvements produce qualitatively different outcomes. Small model improvements often yield incremental benefits: slightly better wording, fewer syntax errors, or minor speed gains. A step change, by contrast, opens up new possibilities—features that previously weren’t viable now become practical to build and ship.

Here are the practical ways that manifested for us:

  • More reliable single-pass completions: Users get a valid, usable result more often on the first try.
  • Lower engineering overhead: We spent less time building brittle rule-based systems around the model.
  • Faster prototyping cycle: Designers and PMs could produce testable prototypes quickly.
  • Broader language coverage: We could prioritize product experience over language plumbing.

Each of these improvements multiplies across teams: design, product, engineering, and localization. The net effect is a meaningful acceleration of feature development and a better experience for end users.

🧩 How we changed our prototyping workflow

With a more capable model, we didn’t just replace one component with another; we refined how we prototype. A better model changes tradeoffs across the entire workflow. I’ll describe the changes we made and why they matter.

From heavy rule engines to model-first design

Historically, when models weren’t reliable enough on their own, we leaned heavily on rule engines, templates, and complex pipelines to ensure predictable outputs. That approach worked, but it increased code complexity and slowed iteration. With GPT-5’s higher fidelity, we shifted to a model-first design where the ML model handles much of the heavy lifting and deterministic logic is applied only where necessary.

That doesn’t mean we removed all safeguards. For production features, we still use schema validation, guardrails, and hybrid approaches to ensure safety and correctness. But the threshold for relying on the model rose—meaning we could prototype with fewer scaffolds and expose more realistic experiences to testers early in the design process.

More exploratory experiments, earlier

One of the most tangible benefits was being able to test more ideas, earlier. Instead of delaying experiments until we had a robust backend or a polished UI, we could produce plausible outputs with GPT-5 and use those outputs in user tests. This led to faster qualitative insights and helped prioritize which prototypes merited engineering investment.

Tighter feedback loops

Because prototypes were higher fidelity sooner, the feedback we received from stakeholders and users was more actionable. Rather than critiquing a concept, testers could assess real interactions—whether a proposed poll option was clear, whether an autogenerated formula solved the right problem, or whether quiz distractors felt plausible.

⚖️ Responsible deployment and safety considerations

Improved capabilities bring increased responsibility. As we integrate GPT-5 into Canva’s features, we applied the same safety and ethical thinking we use across our product development. Better models are powerful, but they also require careful guardrails to prevent misuse, bias, and poor user outcomes.

Key considerations we focused on included:

  • Output validation: For features like Magic Formulas, it’s crucial that generated formulas are syntactically correct and safe to execute. We implemented validators that check formula structure and test outcomes in a sandbox.
  • Bias and fairness: For content generation and multilingual support, we monitored the model’s outputs to detect and address biased or culturally insensitive language.
  • Privacy: We ensured that user data is handled according to our privacy standards and that generation does not inadvertently leak sensitive information.
  • Explainability: Where possible, we provided users with context about how a result was generated and clear affordances to edit or override generated content.

Responsible deployment is not an afterthought. It’s an integral part of product design when using potent generative models.

🔧 Engineering implications: reliability, latency, and costs

From an engineering perspective, introducing a more powerful model alters tradeoffs in reliability, latency, and cost. We evaluated GPT-5 not just for quality, but also for how it fits into our system constraints.

Key technical considerations we addressed included:

  1. Latency optimization: Users expect snappy experiences. We worked on batching, caching, and progressive rendering techniques to hide model latency and keep interactions fluid.
  2. Cost management: Stronger models often come with higher compute costs. We balanced when to call the model directly versus when to use lighter-weight approaches or cached outputs.
  3. Fallbacks and hybrid logic: For critical operations, we implemented deterministic fallbacks and validation to ensure reliability even if the model fails to produce an acceptable result.
  4. Monitoring and telemetry: We instrumented model responses to monitor quality drift, detect error modes, and gather usage patterns that inform further iteration.

These engineering practices ensure that higher-quality outputs don’t come at the expense of predictable performance or runaway costs.

🧠 Designing user experiences around generative outputs

When a model can produce high-quality outputs, product designers must think differently about the user interface. Generative outputs are probabilistic; users need control and clarity. I’ll share practical design patterns we adopted to make interfaces that feel trustworthy.

Surface intent, not internals

Users care about results, not how the sausage is made. Our interfaces surface the generated output and an editable representation of the underlying logic when it helps the user. For example, Magic Formulas shows a readable natural-language explanation alongside the actual formula, helping users understand and refine the result.

Progressive disclosure

For multi-step outputs like quizzes or formulas, we use progressive disclosure. Present a concise, high-confidence result first, and let users expand to see alternatives, explanations, or the exact syntax if they want to dive deeper.

Editable outputs and human-in-the-loop

Generated content should be a starting point, not an endpoint. We prioritize letting users edit outputs directly—change a quiz question, tweak a poll option, or refine a formula. That empowers users to adapt generated content to their context and reduces frustration when the model gets the gist but not the nuance.

📊 Real-world impact: faster prototyping, happier teams, better products

The combination of model improvements, design patterns, and engineering practices produced tangible outcomes for our teams and users. Faster prototyping leads to more experiments, which leads to better product-market fit. A few concrete impacts I observed:

  • Reduced design-to-prototype time: Teams could take a concept to a working prototype in a fraction of the prior time.
  • Higher-quality early user feedback: Because prototypes were closer to final experiences, feedback was more specific and actionable.
  • Lowered barriers for non-technical users: Features like Magic Formulas enabled users unfamiliar with formula syntax to accomplish complex tasks.
  • Faster international rollouts: Improved multilingual performance helped us reach more users quickly and equitably.

These benefits compound across product lines—each acceleration reduces time-to-value for users and increases our ability to innovate.

🔮 What comes next: opportunities and open questions

GPT-5’s improvements are exciting, but they also raise important questions and open up new opportunities. Here are a few directions we’re actively exploring.

Broader interactive features

As models become better at multi-step reasoning, we can imagine a wave of interactive features that previously felt out of reach: guided document creation, intelligent templates that adapt to user context, and more advanced educational tools that generate adaptive quizzes and explanations on the fly.

Personalized generative assistants

One promising area is personalized assistants that adapt to a user’s style, domain, and history. With the right privacy and control mechanisms, assistants could generate content that matches brand voice, learning level, or classroom objectives—improving quality and saving time.

Tool integration and composability

Generative models can be more effective when combined with tools—calculators, validators, or domain-specific APIs. We’re experimenting with composable architectures where the model orchestrates specialized tools to produce more reliable results.

Human-AI collaboration workflows

Finally, I’m excited about improving human-AI collaboration. Better models make it easier to design workflows where humans and AI amplify each other’s strengths: the model proposes, the human curates, and the system learns from those decisions. Designing for that loop is both a product and a research challenge.

🛠️ Practical advice for teams evaluating GPT-5

If you’re part of a team considering GPT-5 or similarly capable models, here are pragmatic recommendations from our experience:

  • Start with measurable success criteria: Define what “successful completion” looks like for your feature, and build tests that measure it.
  • Prototype with high-fidelity outputs: Use the model early to generate realistic content for user tests, but keep validation layers in place.
  • Prioritize output validation: Especially for executable outputs like formulas, ensure syntactic and semantic checks before exposing results to users.
  • Design editability: Always allow users to edit generated outputs; treat the model as a collaborator, not an oracle.
  • Monitor production quality: Instrument model responses and set up alerts for drift, bias, or failure modes.
  • Balance cost and value: Consider hybrid strategies where the heavy model is used selectively while lighter approaches handle lower-risk requests.

🔗 Example: a prototyping playbook we used

To make this concrete, here’s a condensed playbook that reflects how we approached prototyping with GPT-5.

  1. Define the user intent: Capture the user’s goal in simple language. Example: “Create a formula to compute percent change for items where sales are above threshold.”
  2. Create a minimal API contract: Define inputs and expected outputs for early tests—what fields will the model return? What schema must those fields follow?
  3. Generate diverse examples: Use the model to produce a variety of plausible outputs across use cases and languages.
  4. Validate syntactically: Run generated outputs through validators and sandboxed executions.
  5. UI-first testing: Place generated outputs into a mock UI for user testing. Observe editing behavior and failure modes.
  6. Iterate and instrument: Use telemetry to track success metrics and iterate on prompts, model settings, and UI affordances.
  7. Plan production safeguards: Before rollout, add rate limits, fallback content, and monitoring dashboards for ongoing quality assurance.

This playbook helped us move from concept to tested prototype quickly while managing risk and ensuring usefulness.

📚 Lessons learned and surprises

Every new model brings surprises. Some lessons from our GPT-5 experience were expected; others were instructive.

  • Prompt engineering remains valuable: Even with stronger models, clear prompts and contextual examples improve outcomes.
  • Quality is uneven but improving: Multilingual performance and complex logic handling increased dramatically, but corner cases still exist.
  • User editing is powerful: Many users prefer to start with a good draft and refine it; supporting easy edits increased satisfaction.
  • Domain specificity matters: For specialized use cases, combining the model with domain-specific rules or tools produced the best results.

These lessons shaped our approach to productizing GPT-5-powered features and will inform future integrations of generative models across Canva.

🧭 How this aligns with Canva’s mission

All of this work aligns tightly with Canva’s broader mission: to empower everyone to design and communicate effectively. Generative models serve that mission by lowering barriers to entry and enabling feature parity across diverse user bases. Whether someone is creating a social post, crafting educational material, or analyzing data, the goal is the same: provide tools that make complex tasks feel simple.

By integrating GPT-5 into prototyping and product development, we’re expanding who can do what and how quickly. That acceleration isn’t just about speed—it’s about unlocking creative potential for people and teams who previously lacked the resources or expertise to produce professional results.

📣 Final thoughts and an invitation

GPT-5 has already been a meaningful upgrade for us at Canva. It represents a step change in our ability to translate natural language into actionable, reliable outputs, enabling faster prototyping and broader language support. The 44% improvement in complex multi-step task completion is a concrete indicator of that shift, and the practical benefits—like better Magic Formulas and more rapid experimentation with polls and quizzes—are visible in our workflows.

As we continue to experiment, my invitation to other teams is simple: treat models like GPT-5 as collaborators. Use them to expand what’s possible, but design systems and interfaces that give users control and transparency. Measure outcomes that matter to users, prioritize safety, and iterate quickly based on real feedback.

“In our early experiments with GPT-5, we noticed a forty four percent improvement in successful completions of complex and multi-step tasks.” — CJ Jones

If you’re curious about the specific ways generative models can change your product development cycle, I’m happy to share more examples and learnings from our experiments. The pace of innovation is fast, and collaboration across teams and companies will help ensure these tools are used to expand access to creativity responsibly and effectively.

📝 Recap — What to take away

Here are the key takeaways I want you to leave with:

  • GPT-5 provides a step change: Improved multi-step reasoning means we can build features that were previously impractical.
  • Meaningful metric improvement: We observed a 44% improvement in successful completions for complex tasks in our early experiments.
  • Real product wins: Magic Formulas and interactive content like polls and quizzes are concrete examples of faster prototyping and higher-quality outputs.
  • Global impact: Stronger multilingual performance makes it easier to deliver features across 100+ languages.
  • Design and safety matter: Output validation, user editability, and monitoring are essential for responsible deployment.

🔍 Additional resources and next steps

If you want to explore these ideas further, consider starting with a small, measurable experiment: pick a critical multi-step workflow in your product, define success criteria, and prototype using a capable model. Use the playbook I outlined earlier to structure your work, instrument outcomes, and iterate based on user feedback.

Finally, stay curious and collaborative. The most valuable advances come when cross-functional teams—design, engineering, product, and safety—work together to translate model capabilities into user value.

🙏 Acknowledgments

I want to acknowledge the engineering, design, and product teams at Canva who have contributed to the experiments and prototypes I discussed. Their work in validating, iterating, and ensuring safe deployment is what turns model performance into product impact. And thanks to the broader community of researchers and engineers advancing generative models; their progress makes new product possibilities possible.


AI World Vision

AI and Technology News