Notion’s rebuild for agentic AI: How GPT‑5 helped unlock autonomous workflows

⚡ The moment AI became a different gear

At Notion, our mission has always been to help people build beautiful tools for their life’s work. Over the past few years, that mission shifted in a single, decisive way: artificial intelligence stopped being a feature and started being the infrastructure. Working with OpenAI and their evolving models — from GPT‑4 and GPT‑4o to GPT‑4o mini and now GPT‑5 — we rebuilt Notion to be natively AI powered. That rebuild changed not just how the product works, but how people think about software itself.

I remember a particular company trip after COVID when this shift went from abstract to real. My co‑founders, Ivan and Simon, had early access to GPT‑4, and they locked themselves in a room to build. By the end of a week they had a prototype that felt like an entirely new gear for Notion. What struck me most was not how fast they moved, but how the prototype made it possible for someone to speak in plain English and have software emerge on the other side.

"We've gone from people needing to learn software to just speaking in English and seeing a software emerge on the other side of that."

That line captures the tectonic shift. AI gave us a new form of electricity for product design — a way to translate natural language into actions, structure, and value. From that point forward, the work was about scaling that experience, making it reliable, and weaving it into the fabric of everyday work.

🧭 From prototype to native AI workspace

Turning an inspiring prototype into a product that millions trust means solving many problems at once. It’s not enough for the model to be smart. The system around the model must be fast, reliable, and integrated with the tools people already use.

We focused on three things simultaneously:

Speed and latency. AI actions need to be quick and predictable. If a feature feels slow, people stop relying on it for day‑to‑day work.
Context and retrieval. The model must access the right information from where work actually lives: documents, code, design files, tickets, messages.
Safety and control. As models gain more autonomy, users must retain control and trust over what happens in their workspace.

The path to a natively AI powered workspace required a deep technical partnership with OpenAI. We integrated GPT‑4o for high‑quality generation, GPT‑4o mini for speed‑sensitive tasks, embeddings for retrieval, and finally GPT‑5 to unlock more agentic, autonomous workflows. Each model brought different tradeoffs, and we built the layers to orchestrate them effectively.

🔗 Building connectors that matter

People do their work across many systems. If Notion AI was only able to read content that lived in Notion, it would be a neat trick. The real promise is when AI can answer questions and take actions across your entire tool set.

To deliver that promise we built connectors to GitHub, Confluence, Slack, and more. That meant designing secure, performant pipelines that pull the relevant content into our retrieval systems and represent it in a way the model can reason about.

Here’s why connectors matter in practice. As an engineering manager, I want to ask a single question and get a coherent answer. I want to know who wrote the code, where the design lives in Figma, and what the product requirements document says about a particular feature. Instead of hunting through multiple apps, Notion AI can surface the relevant PR, the design link, and the context from the PRD in a single conversational reply.

⚙️ How embeddings made retrieval fast and reliable

Retrieval is the unsung hero of large language model systems. Embeddings let us transform documents, comments, PR descriptions, and design notes into vector representations that the system can search quickly. Pairing those embeddings with efficient indexes and thoughtful caching gave us retrieval that is both fast and high quality.

Speed biases behavior. Internally, the transition from an early internal prototype to general availability shaved about 50 percent off response times. That reduction wasn’t just a vanity metric. It enabled people to depend on AI in the middle of their workflows rather than seeing it as a novelty. When answers come back quickly and consistently, teams start to build processes around the AI.

"From internal usage to GA, it's 50% faster. And that speed actually allows our users to use it reliably."

🤖 Agentic AI and autonomous workflows

At this point you’re probably asking what "agentic AI" means in practice. For me, agentic AI refers to systems that do more than answer questions. They can plan, act, and coordinate across tools to carry out multi‑step tasks on behalf of a user. GPT‑5 took us further in this direction by enabling reliable orchestration and decision making across multiple services.

Autonomous workflows are workflows where the AI can:

Identify the steps required to achieve a goal.
Take actions across connected systems (create tasks, open PRs, update docs).
Ask for clarification when necessary and manage follow ups.

Imagine asking Notion to "prepare the rollout plan for the new feature." A non‑agentic system might return a checklist or a draft. An agentic system can gather relevant docs, summarize the PRD, surface outstanding dependencies, create a draft timeline in your calendar, and even create follow‑up tasks in your task tracker — all while keeping you in the loop.

📈 The adoption story: usage, retention, and trust

The adoption numbers were the clearest validation of the approach. Customers who used Notion AI were nearly two thirds more active on Notion than customers who didn’t. That’s a huge usage delta. Beyond that, about three quarters of users said they would never go back — that AI had become integral to how they work.

Those numbers come from product behavior, not survey optimism. People open Notion more often, lean on it for diverse work, and in many cases have reshaped their team's processes around AI‑augmented workflows.

"Unsprisingly, when customers are using Notion AI, there are almost two‑thirds as active on Notion than customers that don't. And from that, three‑quarters of them say that they would never go back."

🧩 Design principles that guided the rebuild

There were design and engineering principles we kept returning to:

Make the AI a collaborator, not a black box. Present sources, let users see provenance, and give them easy ways to accept, modify, or reject suggestions.
Prioritize useful defaults. Automations should solve common problems with minimal configuration, while remaining customizable for power users.
Fail gracefully. When the model is uncertain, surface options instead of fabricating answers. Let the system ask follow‑up questions.
Measure trust. Track when users accept AI suggestions, how often they need to correct them, and whether the AI reduces time to complete tasks.

We built the product with continuous feedback cycles. We used Notion internally — we were our own earliest and toughest customers. That approach uncovered real production edge cases: messy docs, conflicting signals across tools, and human expectations for accountability.

🔒 Safety, control, and enterprise needs

Enterprise customers have higher expectations around security, auditability, and control. To meet those needs we introduced guardrails and controls at several levels:

Data controls. Administrators can decide what sources the AI can index and whether certain connectors are enabled.
Access controls. Actions initiated by an agent can require approvals or be restricted to specific roles.
Provenance and traceability. For every generated suggestion, we show the sources used, so teams can verify claims.

Working closely with OpenAI, we also made choices about model selection by task. For tasks that require high reliability and creative reasoning we might use GPT‑4o or GPT‑5. For latency‑sensitive or lower‑risk tasks we might use GPT‑4o mini. That multi‑model approach balanced cost, speed, and capability.

🛠️ Implementation: orchestration, routing, and fallbacks

Under the hood, building agentic capabilities is largely an orchestration challenge. A single user request can trigger:

Retrieval of relevant documents via embeddings.
Routing to the proper model depending on cost and latency requirements.
Execution steps that touch external APIs with appropriate permissions.
Human approval flows for sensitive actions.

We designed a layered architecture:

Index and retrieve. Convert content to embeddings and keep them indexed for quick similarity searches.
Plan. Use a reasoning model to generate a plan of actions and checks.
Act. Execute non‑destructive actions (create a draft, suggest a change) and request approval for destructive ones (merge a PR, change permissions).
Confirm. Provide a summary of actions taken and the evidence used.

This flow helps avoid hallucinations and gives users a clear audit trail of what the system did and why.

💡 Examples of autonomous workflows in practice

To make this concrete, here are a few autonomous workflows teams have adopted.

Product launch assistant

Ask Notion to assemble a launch plan. It pulls in the PRD, current roadmap, outstanding engineering tasks from GitHub, and design assets from Figma. It produces a timeline, identifies owners, and creates a checklist of release tasks in your tracker. It then opens a draft announcement and routes it to the communications owner for review.

Engineering handoff

When a feature is ready for engineering, Notion can generate a succinct developer brief: links to the relevant PR, a summary of the UX decisions, open questions, and a prioritized task list. It can also create a Slack thread and ping the right engineers to get attention.

Meeting summarization to action items

Record a meeting and let Notion produce a structured summary, extract decisions, and create follow‑up tasks. Those tasks can be assigned automatically to participants based on context and past responsibilities.

📊 Measuring success: what to track

For teams building agentic features, I recommend tracking a mix of adoption, quality, and trust metrics:

Activation. How many users try AI features within the first week?
Frequency. How often do users interact with AI in their day‑to‑day work?
Retention. Are AI users more likely to return and use the product?
Acceptance rate. How often users accept AI suggestions without modification?
Correction rate. How often suggestions need significant edits?
Time saved. Reduction in time to complete common tasks.

For us, those measurements validated that this was not a gimmick: users who engaged with Notion AI became more active and more loyal. That drove further investment in integrations and agentic capabilities.

🤝 The partnership with OpenAI

Building this system was a partnership. OpenAI provided the models and APIs; we built the product experience, connectors, and orchestration specific to the needs of knowledge teams. That collaboration allowed us to move faster than we could have alone. It also helped shape product decisions where model behavior and product expectations intersect.

Working with a model provider, rather than building models from scratch, meant we could focus on the places where product design and integrations deliver the most value: retrieval strategies, user controls, and human workflows. The APIs gave us primitives; the product made them useful.

🔎 Practical recommendations for teams

If you’re building agentic features or integrating AI into a product, here are the lessons I’d share from our work:

Prototype quickly. A small, focused prototype will surface real user expectations and failure modes faster than a long theoretical design process.
Be explicit about sources. Always show provenance for model outputs so users can verify and trust results.
Start with retrieval. Adding embeddings and high quality retrieval often yields the biggest improvement in factuality and usefulness.
Design for speed. Latency matters. Faster responses lead to higher reliance and more integrated workflows.
Layer permissions and approvals. Treat automation as powerful but conditional. Let users deputize the system incrementally.
Measure the right things. Track not only usage but whether AI reduces time to value and increases retention.

🌱 Why being your own customer matters

One principle I keep advocating for is building what you use. We are each other’s customers. We lived in the product, which created a ruthlessly useful feedback loop. When our teams started relying on Notion AI for everyday work, we encountered the same friction our customers did and fixed it quickly.

"I think maybe the most fun part about this whole journey is that we are each other's customers. And so when OpenAI tells us that our Q&A product is really good, that gives us confidence that we've built something that is truly world‑class."

Being our own customer meant we held ourselves to a higher standard. If a workflow saved us minutes or reduced cognitive load, we knew it had product market fit. If it confused us, we iterated.

🔮 Where agentic AI goes next

GPT‑5 and similar advances are making agentic systems more capable and more reliable. That unlocks a few clear directions I expect to accelerate:

Deeper cross‑tool orchestration. Agents will coordinate across more services, reducing the need for manual context switching.
Richer team workflows. Agents will choreograph multi‑person processes, not just single‑user automation.
Smarter, context‑aware defaults. Systems will anticipate needs and present sensible actions without being intrusive.
Expanded workplace automation. Repetitive operational tasks across HR, finance, and engineering will become automated end‑to‑end with oversight.

That future relies on continued attention to safety, user control, and clear auditability. The technology is powerful, but progress is meaningful only if it earns and preserves user trust.

📌 Closing thoughts

AI has given us a new kind of product electricity. For Notion, integrating GPT‑4o, GPT‑4o mini, embeddings, and GPT‑5 transformed the workspace from a place where people store knowledge into a place that actively helps people do work. The journey from a locked‑room prototype to a globally adopted product taught us that capability alone is not enough. Speed, integrations, provenance, and human control are what make agentic AI genuinely useful.

As teams build toward autonomous workflows, the most important decisions are about how they combine model capabilities with product design. If you focus on reliability, transparency, and practical integration with existing tools, you’ll move the needle from curiosity to everyday utility. For me, that is the defining opportunity of this era: turning generative models into dependable collaborators that expand what teams can accomplish.