OpenAI on OpenAI: Applying AI to Our Own Workflows

Featured

By Scotty — OpenAI Go-to-Market Innovation Team

In this article I report on how OpenAI is using its own frontier AI to transform internal workflows across sales, people (HR), and support. This piece is based on a presentation I gave for OpenAI’s developer audience and expands on the technical and organizational decisions that made three internal agent-driven applications successful. The goal of this report is to show a practical, repeatable approach you can take to build internal applications that amplify expertise, accelerate outcomes, and scale organizational knowledge.

Throughout this report I’ll walk through the problems we faced, the architectures we used, the human workflows we modeled, the results we saw, and the step-by-step playbook I recommend for any team that wants to deploy agent-driven workflows inside their own business. I’ll also spotlight specific team members whose operational craft we encoded—Sophie, Max, Joe, Maggie—and explain how real people shaped the system.

📣 Executive summary

Companies are asking, “How do we use AI to drive efficiency?” I believe the more powerful question is: “How do we use AI to amplify expertise?” Every organization has top operators—people who consistently win, troubleshoot, and ship. The work I led aimed to capture that craft into agentic workflows so every employee can operate more like those top operators.

We built three internal applications as proof points:

  • Go-to-Market Assistant (Sales) — an agentic assistant embedded into Slack and ChatGPT that prepares reps for meetings, generates demos, handles follow-ups, and continuously improves from rep feedback.
  • Open House (People / HR) — a company-wide knowledge and directory assistant that answers onboarding, travel, and local-office questions and helps employees find colleagues with the right expertise.
  • Support Agent System (Support) — a self-improving ticket processing system that codifies Standard Operating Procedures (SOPs) and handles a large fraction of inbound support tickets autonomously.

Across these efforts we followed a shared architecture: (1) get your data right, (2) build a semantic layer and vectorized knowledge, (3) design skills (agents) that reflect expert workflows, (4) embed into familiar tools, and (5) set up eval-driven feedback loops so the system improves over time.

Key outcomes I’ll cite in this report include adoption metrics (hundreds of team members, weekly active rates), productivity wins (about one full day per week saved for reps), support deflection (about 70% of tickets now handled autonomously), and quality ratings (roughly 80% positive QA for handled tickets). These numbers illustrate how agent-driven internal apps can shift capacity, response time, and quality at scale.

🧭 Why “amplify expertise” and not just “automate”?

When we started, many teams framed the opportunity as “automation to drive efficiency.” That’s a useful objective, but I wanted to push a different frame: AI as an amplifier of human craft.

Every company has people who do things better than anyone else. At OpenAI we have Sophie, a top-performing salesperson who scales her craft by following repeatable habits. We have Ken, a support specialist who untangles complex technical issues. We have Alex, an engineer who ships quickly. The premise that guided our work was simple: if we can capture the patterns of those top operators and embed them into systems, the whole organization becomes more effective.

This reframing shaped three core design choices:

  • Start with a top operator: Build alongside someone who can articulate how they win and is willing to be the system’s initial trainer.
  • Encode craft as skills: Model workflows as discrete agent “skills” that replicate the expert’s behavior for specific tasks (meeting prep, demo creation, ticket triage).
  • Embed, don’t replace: Put the agent where people already work—Slack, ChatGPT, internal platforms—so the experience is natural and feedback loops are frictionless.

Those principles informed the projects I’ll describe next.

🛠️ Go-to-Market Assistant — Sales (meet Sophie’s approach)

Problem: Our sales organization was growing fast, introducing frequent new features, and facing high conversational volume. Reps were stretched thin: they needed to do quick customer research, produce technical answers in real time, run demos, and follow up—all while preserving a great customer experience.

Approach: I paired with our best rep, Sophie. I sat next to her, observed how she worked, and asked her to show me exactly how she prepared for meetings, created product champions, developed demos, and followed up. Sophie’s method was scrappy, practical, and repeatable: she knew exactly which signals to look at before a call, which product examples to show, and how to structure post-call outreach so momentum wasn’t lost.

From those sessions we built a focused agent that embodied “Sophie’s version of excellence.” We started with a core set of four skills—meetings, product knowledge, custom demos, and customer research—and iterated to about ten live skills today.

How the system is structured

The architecture has three layers:

  1. Data foundations: We created a simplified customer data model and a semantic layer so GPT-5 could “understand” the customer context. We vectorized key documents (account notes, product docs, strategy materials).
  2. Agent spine: Agents SDK + GPT-5 + Responses API form the agent core. Individual skills are specialized agents optimized for replication of expert workflows.
  3. Surfaces: The assistant is available in ChatGPT via an MCP connector and in Slack where reps naturally operate. For backend actions, the OpenAI platform executes tasks.

In-practice example: Prepping for a day of meetings

I’ll walk you through a typical morning as if I were a mid-market rep who just joined the team.

I open Slack and see the Go-to-Market Assistant’s Tuesday briefing. It lists my meetings for the day—Acme, Brookfield, and Redwood—and provides a short headline for each meeting: which are first calls, which are follow-ups, and what to prioritize. The assistant also posts a detailed meeting prep doc in a thread: attendee LinkedIn links (pulled from a web browse connector), OpenAI internal data (usage patterns, self-serve adoption), and synthesized hypotheses and opportunities.

For Acme, the assistant recommends a demo focused on coding and deep research. That recommendation isn’t generic. It’s grounded in our best Solutions Engineer Max’s patterns—Max has built over 100 demos. We vectorized Max’s demos and combined them with the customer’s usage data to produce a tailored demo script. I can copy the generated prompt into ChatGPT to render a dynamic web demo page that walks the customer through a real coding example.

After the meeting, the assistant produces a recap, action items, and it detects an unanswered technical question about chat completions and the Responses API. The system routes that to the product knowledge skill, which returns a well-formed answer I can take directly to the customer.

If the assistant misses a key follow-up—for example, the customer requested a prompt tuning workshop for GPT-5—I correct it in the thread by clicking “Was it helpful? No” and provide feedback. That feedback prompts a regeneration for my rep-specific task and—critically—feeds into our evals platform and developer-facing channel where developers can review and approve prompt improvements. The change gets merged, and the entire organization benefits from the refinement.

Why this worked

  • Built with a top operator: Sophie’s routines were explicit training data for the assistant’s behaviors.
  • Familiar surfaces: Slack and ChatGPT are already part of the reps’ daily workflows, so adoption was natural.
  • Modular skills: Skills encapsulate distinct workflows, allowing focused iteration and incremental rollout.
  • Closed-loop improvement: Rep feedback becomes prompt improvements via evals, so quality gets better over time.

Impact and adoption

Today the Go-to-Market Assistant supports our sales org—over 400 team members—and shows strong engagement: reps exchange about 20 messages per week with the assistant and report saving roughly one full workday per week. That’s a meaningful capacity shift: a day back each week per rep means more time for strategic work and customer-facing activities.

🏠 Open House — People (HR and onboarding assistant)

Problem: Rapid hiring across multiple offices meant employees had trouble finding institutional knowledge. New employees couldn’t quickly find events, policies, local office logistics, or who to contact for a specific help. Slack channels can be chaotic and information gets lost.

Approach: We built Open House—a knowledge-first experience combining directory data, announcements, office guides, videos, and team resources into a single conversational surface. The concept is simple: capture the small bits of company information (local guides, announcements, policies) into a centralized CMS, vectorize those documents, link them to HR systems (Workday), and expose them through ChatKit and Slack as a conversational assistant.

Design highlights

  • CMS for micro content: We capture bite-sized updates—office guide changes, announcements, policy edits—in a CMS so the assistant can cite sources and show the most relevant items.
  • Directory integration: Employee profiles are enriched by HR systems and by self-reported skills completed during onboarding, enabling skill-based search (who can help me with go-to-market demos in New York?).
  • Chat and citation UX: Each chat answer includes citations. Users can click into reports, an office guidebook, or the Slack channel for a particular office for real-time context (like what’s for lunch in the office today).

Real scenario: Visiting the New York office

Last month I visited the New York office for an onsite meeting. Instead of sending a few messages to colleagues or digging through files, I used Open House. I asked: “I’m visiting the New York office—what’s the travel policy, how do I access the office, and who can help with a go-to-market demo?”

The assistant returned a concise answers chain-of-thought with citations. I clicked the office guidebook and found practical details: entry procedures, where to sit, and local Slack channels. Then I asked whether anyone on my team in New York could help build a go-to-market demo. Open House searched the directory and the internal wiki, surfaced five candidate profiles, and highlighted Joe—who, per his profile, has demo-building expertise.

I clicked “Message in Slack” directly from the profile card, sent Joe context about the visit and demo plan, and he replied that he had time. That sequence—question, answer, find expert, connect—took minutes. Without Open House it might have taken hours of searching, pings, and guesswork.

Adoption and results

Open House has become a primary entry point for organizational knowledge. About 75% of employees use it weekly. The reasons for high adoption are straightforward: accurate data, designed around real user questions, and a simple conversational interface. Employees don’t need to be trained since the assistant is embedded into familiar tools.

🛎️ Support system — Self-improving, SOP-driven ticket automation

Problem: Support faces both volume and velocity challenges. We have hundreds of millions of users and millions of support tickets every year. New product launches can generate sudden, massive surges. For example, the ImageGen launch generated several magnitudes more tickets than normal and added over 100 million users in a matter of days—great growth, but a major operational hurdle for support.

Approach: We treated the problem as an engineering + design + operations challenge. Instead of trying to handcraft responses for every possible ticket, we codified our support knowledge—SOPs—and trained agents to follow those SOPs. The system was built to be self-improving: gold standards and evals drive continuous updates to SOPs whenever novel patterns or automation failures occur.

Architecture and workflow

  • Data inputs: Ticket streams, Help Center articles, and vectorized SOPs.
  • Core skills: Ticket classification and actioning agents that decide when to respond automatically, when to escalate, and how to tag complex issues.
  • Self-improvement: When automation fails or novel ticket patterns appear, the system routes examples into evals and knowledge updates so SOPs can be updated and pushed back to the agents.
  • Surfaces: The Help Center frontline and a real-time API (alpha) that can handle live conversational support sessions and multimodal inputs.

Why SOPs matter

Support is a rules-first environment. Humans have historically used SOPs to ensure consistent, auditable, and high-quality responses. Those same SOPs are exactly the right thing to teach an agent. We translated human SOPs into structured knowledge that the model can consult. When the model’s output deviates, QA reviewers or evals identify the gap and the SOP is updated. That creates a virtuous cycle.

Outcomes

The support system achieved substantial impact immediately and at scale:

  • About 70% of tickets are now deflected or handled autonomously by the system.
  • The agentic system outperforms our legacy solution by roughly 30% on quantitative performance metrics.
  • Approximately 80% of agent-created ticket responses, when manually reviewed by QA, receive a high satisfaction rating.

These results show that when you pair SOP-driven knowledge with modern generative models and a solid feedback loop, you can drive both scale and quality in support operations.

🔁 Shared engineering patterns across all three use cases

Across sales, HR, and support, several common patterns emerged. These are practical, repeatable principles you can apply when you build internal agentic applications.

1. Get your data right

Foundational quality matters more than clever modeling. For each use case we built a simplified data model, standardized sources of truth, and vectorized documents that matter (product docs, account notes, SOPs, people profiles).

Key actions:

  • Identify authoritative sources for each problem domain.
  • Normalize fields across systems so the agent can reason about “customer state,” “employee role,” or “ticket category.”
  • Vectorize and index the most relevant artifacts to support retrieval-augmented generation (RAG).

2. Model specific skills, not a monolithic agent

We intentionally separated behaviors into skills—meeting prep, demo generation, ticket classification, directory and people search—so each skill can be tuned, evaluated, and iterated independently. That reduces complexity and increases reliability.

3. Embed into familiar tools

We did not build a separate, new app for users. We embedded capabilities in Slack, ChatGPT, and the Help Center. Embedding matters for adoption because people keep their context and the feedback loop is immediate. If reps see an answer in Slack, they can react and correct it in-thread.

4. Create self-improvement loops

We used evals to measure quality and route feedback into knowledge updates and prompt optimization flows. The loop looks like this:

  1. User interacts with agent and optionally flags issues.
  2. Examples go to evals for triage and scoring.
  3. High-impact changes are delivered to developers and knowledge owners for approval.
  4. Updated prompts/SOPs are deployed, improving future performance.

5. Make top operators the teachers

Identify a “Sophie” in your organization: a top operator who’s willing to formalize how they work. Build the first agent by reproducing their workflows and heuristics. They act as both the initial training source and the ongoing quality gate.

⚙️ Technical components and integration notes

Below I summarize the primary components and how they were used. This section is intended for engineers and product builders who want an actionable view of the platform choices and tradeoffs.

Model and agent layer

  • GPT-5: Used as the primary reasoning and generative model. The model handles synthesis, few-shot behaviors, and chain-of-thought style reasoning when needed.
  • Agents SDK: Orchestrates multi-skill flows, allowing skills to call each other and external APIs.
  • Responses API: Delivers structured responses and enables dynamic content generation for chat and demo experiences.

Data and vector storage

Vectorization is central to retrieval-augmented behavior. We vectorized product docs, demo scripts, SOPs, account notes, and people profiles. The retrieval system feeds the model with the context it needs to be precise and grounded.

Connectors and surfaces

  • MCP connector: Exposes ChatGPT integration for internal assistants.
  • Slack integration: We built a private-channel assistant experience so agents can post briefings, prep docs, and recaps inline with the rep’s workflow.
  • ChatKit / Chat UI: Embedded conversational UI used for Open House and for Help Center responses.
  • Internal platform execution: For backend tasks and ticket actions we invoked agents via the OpenAI platform.

Quality and safety

We layered guardrails around high-stakes outputs. For support and product knowledge, rigorous SOPs, human-in-the-loop QA checks, and eval-driven thresholds ensured we only automated tasks when confidence and compliance were acceptable.

📊 Measured impact — numbers that mattered

When talking with engineers, product leaders, and executives, numbers help make the case. Here are the key metrics and outcomes we tracked and what they signaled:

  • Sales productivity: Reps saved about one full day per week on routine prep and follow-up tasks. Reps interact ~20 times per week with the assistant, indicating habitual use.
  • Adoption: Over 400 sales team members have access; ~75% of employees use Open House weekly.
  • Support deflection and quality: ~70% of tickets are deflected/handled autonomously; system performance is ~30% better than legacy; QA rates ~80% highly positive.
  • Agent count: We launched with four core skills in sales and iterated to about ten live skills. This progressive rollout allowed safe scaling and focused improvements.
  • Surge resilience: During ImageGen’s launch we handled several orders-of-magnitude increases in tickets by relying on SOP-driven agents and rapid knowledge updates.

🔒 Governance, evals, and continuous improvement

One of the biggest operational challenges is maintaining quality and alignment as agents scale. We focused on two mechanisms to manage this:

1. Evals as the operational backbone

Evals enable us to measure agent performance continuously. Whenever the assistant’s output is scored as low confidence or a user flags a problem, we create an eval case that goes into a developer review channel. That review can result in a prompt tweak, an SOP update, or a UX change.

The process ensures that improvements come from measured failures and that they propagate to all users after approval. This is what turns a one-off fix into a system-wide capability improvement.

2. SOPs, owners, and approval flows

For support operations, SOPs are the canonical source of truth. SOPs have owners—people accountable for the content and updates. When a proposed change comes from evals, the SOP owner reviews and approves. This creates auditability and traceability for what the agent can and cannot do.

📈 A practical playbook: Three things to start this week

If you want to replicate these wins inside your company, here are three practical actions you can take right now. I framed them as a short sprint to get traction in days, not months.

  1. Find your Sophie. Identify a top operator in the team you want to support. Spend time with them. Map their workflow and the heuristics they use to make decisions. The goal is to produce a replicable specification for a skill.
  2. Embed into familiar tools. Don’t build a separate app. Put the assistant where people already work—Slack, your chat platform, or your help center—and make feedback natural and immediate.
  3. Pick a scalable platform. Use an agents toolkit (like Agent Kit / Agents SDK), a chat component (ChatKit), and an evals framework. These platforms dramatically reduce friction and let your team iterate fast.

Those steps form a short loop: observe an expert, build a focused skill, deploy into the flow of work, gather feedback, and iterate. Repeat and expand across other skills and teams.

🧩 Common pitfalls and how we avoided them

When teams try to build internal agentic applications, certain pitfalls commonly appear. Here are the ones we saw and how we addressed them:

Pitfall: Poor data hygiene

Relying on scattered, inconsistent data produces shaky agent outputs. We invested time up front to standardize customer and employee data and to vectorize high-value documents.

Pitfall: Over-automation too fast

Trying to automate a deeply human judgment without clear SOPs leads to missteps. We focused on automating deterministic tasks first (meeting prep, routing, FAQ answers) and kept humans in the loop for higher-risk decisions.

Pitfall: Building isolated tools

A separate “agent app” creates friction. Embedding into Slack and ChatGPT allowed adoption to be organic and feedback to flow naturally into the system.

Pitfall: No ownership for content

Knowledge without ownership rots. We assigned owners for SOPs, people profiles, and the CMS content. Owners are responsible for updates, accuracy, and approval of changes triggered by evals.

🧱 The cultural change: teams acting like product development

A recurring observation is that each department—sales, HR, support—started operating more like a software team. They picked product owners, defined success metrics, prioritized features, and performed iterative releases. That shift is powerful: it means organizational knowledge is treated as a product with users, a roadmap, and quality metrics.

That transition is an enabling cultural change. Departments that embrace it can move faster, ship better internal tools, and create a feedback loop between operators and developers that improves both code and process.

🔭 What’s next and where this is heading

The initial projects have matured into platform-level thinking. Here are a few directions I’m excited about:

  • Broader agent templates: Create reusable skill templates for common workflows (meeting prep, FAQ resolution, demo generation) so teams can instantiate new agents quickly.
  • More multimodal experiences: Extend the real-time API to support multimodal inputs—images, audio—so agents can handle richer support scenarios and live demos.
  • Cross-team orchestration: Chain skills across departments so, for example, a sales meeting recap can automatically trigger a support follow-up or a product feedback ticket.
  • Advanced personalization: Deepen per-user personalization so agents can learn preferences, communication styles, and decision patterns to be even more helpful.

🧾 Final observations and lessons learned

After building these internal agents, a few lessons stand out:

  • Start small and valuable: Solve a concrete pain point for a team that will regularly rely on the assistant.
  • Operationalize feedback: Make it as easy as possible for users to correct outputs and route those corrections into a structured improvement workflow.
  • Design around people, not features: Model workflows around how people actually work and what decisions they make in context.
  • Measure impact: Track adoption and measurable outcomes—time saved, deflection rates, QA scores—and use those to prioritize work.

📣 Closing call to action

I challenge everyone reading this to go back to your teams and build something your colleagues can’t live without. Start with a “Sophie,” embed in familiar tools, and pick platforms that give you rapid iteration and safe guardrails. If you want to replicate our approach, Agent Kit (including Agent Builder, ChatKit, and Evals) is an excellent starting point to accelerate development.

These agentic applications are not just automation: they are a way to distribute human expertise across an entire organization. When you capture the craft of top operators and embed it into your systems, you multiply the impact of every expert on your team.

“How do we use AI to amplify expertise?” — This question guided our work, and it can guide yours. Find your top operator, model their craft, and build a system that learns and improves with use.

📬 Where to learn more

If you want to dive deeper into how we built these systems, who helped build them, and the tools we used, I’m available for follow-up discussions with builders and teams. I’ve presented these projects publicly and the Agent Kit we used is the quickest way to try the same patterns in your own environment.

Thank you for reading. I look forward to hearing what you build.


AIWorldVision

AI and Technology News