Intro to Agent Builder

Overview of Agent Builder 🚀

I'm Christina Huang from OpenAI, and in this report I introduce Agent Builder — a visual tool designed to make building agentic workflows fast, accessible, and practical for product teams and creators. In the demonstration that accompanied this release, I walked through building a travel assistant that can either produce an itinerary or fetch flight information. My goal here is to expand on that demonstration in a news-style account: what Agent Builder does, how it works, the components you use to assemble an agent, and how you can ship the resulting workflow into a real product using ChatKit or the Agents SDK.

Agent Builder addresses a clear need: teams want to assemble multi-step AI behaviors without having to write and maintain orchestration code. With a drag-and-drop canvas, node types for agents, conditionals, and connectors to tools like web search, you can construct complex agentic flows while keeping visibility into how inputs transform into outputs. As I said in the demo, "Agent Builder is a new visual tool for building AI workflows." That sentence summarizes the core promise: visual, composable, and purpose-built for agentic experiences.

How Agent Builder works 🧭

At its heart, Agent Builder offers a node-based interface that models the logical structure of an agentic workflow. Each workflow starts with a start node where you define inputs and optionally state variables. From there, you connect nodes that perform classification, call specialized agents, run conditionals, and output results. The visual canvas is both a design surface and the single source of truth: when you’re done designing, you can export the workflow as code or drop it directly into your product via ChatKit using the workflow ID.

To understand the building blocks, think in terms of three categories of nodes:

Control nodes — start nodes, if/else branching, and other flow-control elements that determine how messages progress through the graph.
Agent nodes — specialized agents you author on the canvas. Each agent node has a role prompt that defines behavior (for example, a travel assistant that recommends flights or builds itineraries). Agent nodes can be augmented with tools — the demo used web search as an example — and they produce structured outputs.
Output and presentation nodes — formats and widgets that determine how results are returned to the user (plain text, JSON, or custom rich widgets from Widget Studio).

One of the distinguishing features is the ability to define the output format on an agent. For instance, when I created a classifier agent to split user queries between itinerary requests and flight information requests, I explicitly set the output format to JSON with a property called classification which could take the values "flight_info" or "itinerary". That explicit structure makes downstream routing trivial and reliable.

"You connect nodes and create agents without writing any code." — Christina Huang

Building the travel agent step-by-step ✈️

I went step-by-step in the live demonstration to show how quickly you can assemble a functioning travel assistant. Below I reconstruct that process in a journalistic summary with added detail so you can reproduce and extend the pattern.

Step 1: Start node and inputs

Every workflow begins with a start node. This is where you declare what inputs the workflow will receive, and whether there are any persistent state variables to keep across steps. For the travel assistant I used the defaults provided by Agent Builder because they matched the use case. Typical inputs for a travel assistant might include:

user_message (string): the phrase the user types or speaks
user_id (string): optional identifier for personalization
context/state variables: optional fields such as preferred airlines or past searches

Keeping the input model simple makes it easier to test and iterate. If you need more complexity later, the start node lets you expand the schema.

Step 2: Classification

Next, I added a classifier agent to determine whether the user's message was asking for an itinerary or for flight information. In the canvas I named this node classifier and gave it the role prompt:

"You are a helpful travel assistant for classifying whether a message is about an itinerary or a flight."

Crucially, I set the agent’s output to structured JSON and defined the property classification. Explicit outputs reduce ambiguity — downstream nodes can programmatically read parsed.classification instead of trying to parse free text. In the demo, the classifier returned either flight_info or itinerary.

Step 3: Conditional branching

With a classifier in place, I used an if-else node to branch the flow. The conditional checked parsed.classification. The logic was simple:

If parsed.classification is "flight_info", route to the flight agent.
Otherwise, route to the itinerary agent.

This pattern — classify then route — is a common architectural motif when building multitask agents. Modeling intent explicitly lowers the risk of an agent producing a response that is misaligned with the user’s goal.

Step 4: Flight agent

For the flight agent, I created a new agent node with a role prompt that instructs it to always recommend a specific flight using airport codes. The exact prompt in the demo was:

"You are a travel assistant. Always recommend a specific flight to go to; use airport codes."

To ensure the flight data was current, I attached a web search tool to the flight agent. Agent Builder's tools can be connected to nodes to augment them with external information — web search, calendar access, or custom APIs — enabling agents to make decisions based on the latest data.

Step 5: Itinerary agent

The itinerary agent was designed for high-level travel planning. I added another agent node with the prompt:

"You are a travel assistant, so build a concise itinerary."

The itinerary agent’s responsibility is to interpret the user’s constraints (e.g., city, duration, preferences) and produce a compact, practical one-day or multi-day plan. For the demo I asked, "What should I do in a day in Tokyo?" and the itinerary agent returned a concise, well-structured set of suggestions for a single day.

Step 6: Preview and iterate

With the nodes connected, I used Run Preview to test the workflow. Run Preview shows the message traveling through each node so you can observe decisions, intermediate outputs, and final responses. During testing, I submitted both a question about an itinerary and a flight lookup (SFO to Tokyo on October 7th) to demonstrate both branches.

Run Preview is essential for rapid iteration. It not only verifies the logic but also surfaces where prompts need refinement, where additional structured fields are required, or where external tools need to be connected for better accuracy.

Designing rich outputs with widgets 🎨

Text is powerful, but sometimes you want a richer presentation. In the demo I showcased how to use Widget Studio to produce interactive, visually pleasing outputs. I built a flight widget to present flight search results as a compact, helpful card with departure and arrival times, duration, airline, and price — all designed for quick comprehension.

Here’s how the widget workflow works in Agent Builder:

Design the widget layout and artwork in Widget Studio. The studio provides templates and a visual editor.
Export or download the widget template as a single package.
Upload the widget to the agent node and set it as the agent’s output format.
Optionally, include dynamic styling instructions. For the flight widget example, I asked the agent to "choose a background color creatively based on the destination" and to "include time zones, AM or PM".

That final step — instructing the agent to stylize the widget — demonstrates the blending of content generation and presentation control. The agent not only decides what flight to recommend but also how the result should be visually expressed. In the SFO to Tokyo example, the agent picked yellow as Tokyo's color and applied it as the widget background.

Testing and debugging with Run Preview 🧪

Effective testing is a cornerstone of launching robust agents. Agent Builder’s Run Preview provides the playback you need to understand the flow of information. I used a couple of canonical examples during the demo to show the preview in action:

Itinerary test: "What should I do in a day in Tokyo?" — classifier identified this as an itinerary request, routed to the itinerary agent, which returned a concise day plan.
Flight test: "SFO to Tokyo on October 7th" — classifier identified the intent as flight info, routed to the flight agent, which performed a web search and produced a widget-based flight recommendation.

Run Preview displays the following useful artifacts when you run a test:

Message trace — the sequence of nodes the input passed through.
Node outputs — what each agent or node returned, including structured JSON where applicable.
Tool calls — evidence of external tools invoked, like web search queries and their results.
Widget preview — how the final output will appear to an end user when rendered with a widget.

These artifacts give you visibility not only into whether the workflow produced a correct final result, but also into why it made the decisions it did. That transparency is invaluable when calibrating prompts, changing the JSON schema, or deciding whether to connect more tools.

Publishing and integrating your agent 🔗

When you’re satisfied with the agent, Agent Builder lets you publish it directly and expose it to your product. I named my example workflow "travel agent" and demonstrated the two main integration paths:

Option A: Agents SDK

The Agents SDK provides a programmatic interface for teams who want to manage deployment, logging, and customization directly in their codebase. In the demo I displayed the SDK snippet that would be required to embed the agent in a product. While the SDK affords maximum control, it also requires more code to manage, including authentication, error handling, and lifecycle management.

Option B: ChatKit with workflow ID

The simpler path for many teams is to use the workflow ID generated by Agent Builder and plug it into ChatKit. With ChatKit, you can embed the decisioning and conversational behavior without managing an extensive SDK. You get the benefit of the visual workflow while offloading runtime and UI concerns to ChatKit. During the demo, I highlighted that you can either manage the code yourself or "simply take this workflow ID and put it in my product directly using ChatKit."

Publishing produces a workflow ID that becomes the handle for inference in your application. The ID points to the canonical workflow hosted by the platform, meaning updates you make in Agent Builder can be rolled out by updating the published workflow, rather than redeploying code across multiple services.

Best practices and tips 🛠️

Designing agents that are reliable, understandable, and useful requires deliberate choices. Below are practical tips I use and recommend based on the travel agent example and the broader patterns that emerge when building agentic workflows.

1. Use structured outputs whenever possible

When you want other nodes to consume results programmatically, return JSON rather than free-form text. For instance, a classifier that returns {"classification":"flight_info"} is easier to route and less error-prone than embedding the decision in a sentence.

2. Keep prompts focused and specific

Define the agent’s role clearly. Short role prompts like "You are a travel assistant. Always recommend a specific flight; use airport codes." help the model operate deterministically and reliably. The more precise the instruction, the less likely it is to drift into irrelevant content.

3. Attach the right tools

Augmentation matters. If your agent is expected to provide up-to-date facts — flight times, stock quotes, weather — connect it to a web search tool or other external APIs. Without external tools, the agent's knowledge is bounded by its training data.

4. Use conditional branching for clarity

Explicitly separate intents or tasks with a classifier + if/else nodes. This separation keeps each agent small and focused. It also makes maintenance easier: if you need to change flight logic, you only edit the flight agent.

5. Design for observability

Use Run Preview early and often. Validate not just the final answer but the intermediate outputs. If a classifier makes a mistake, examine its prompt or the examples it needs to disambiguate tricky inputs.

6. Start with templates and iterate

Agent Builder comes with templates that capture common patterns. Beginning with a template and customizing is faster than building from scratch, especially when assembling standard flows like support triage, travel planning, or appointment booking.

7. Consider user experience and visual design

When the output matters visually (flight cards, itineraries, pricing tables), use Widget Studio to design a rich front-end presentation. Combine content generation with design instructions so the agent not only produces the data but also prescribes how it should be shown.

8. Use evals to measure performance

Agent Builder integrates evaluation tools so you can run tests that simulate user inputs and measure correctness. Leverage these evals to quantify classifier accuracy, intent routing precision, and end-to-end task success rates.

Advanced features and extensibility ⚙️

Agent Builder is designed to be extensible. The demo used relatively straightforward nodes and a single external tool (web search), but the platform supports more advanced capabilities that power production systems.

Tool integrations

You can attach specialized tools to agent nodes. Typical integrations include:

Web search for fresh information.
Calendar or booking APIs to enable actual reservations and checks.
Databases and internal APIs to personalize responses and respect organizational context.

These integrations let agents go beyond static suggestions and take action or fetch real-time data when necessary.

Widget Studio and custom renderers

Widget Studio allows you to create reusable UI components for presenting structured results. Once a widget is uploaded to an agent node, the agent can return structured data that maps directly into the widget’s fields. This separation of content and presentation enables front-end teams to iterate on UI without changing agent logic, and vice versa.

Exporting as code

For teams that need the workflow as code, Agent Builder supports exporting the full workflow. This export contains role prompts, node wiring, tool connections, and output schemas — everything required to reproduce the behavior programmatically. The export is handy when you want to embed the logic in automated CI/CD pipelines or connect it to custom backend services.

Built-in evals and testing

Agent Builder includes built-in evaluation infrastructure. You can author tests that run a set of inputs through the workflow and measure metrics like accuracy, response appropriateness, or business KPIs. Running these evals before publication gives you confidence that changes won't regress critical behavior.

Real-world use cases and examples 🌍

Although I illustrated Agent Builder with a travel assistant, the same visual design approach applies to numerous domains. Below are concrete examples you might build as a team.

Customer support triage

Classifier: route queries to billing, technical, or general support.
Agents: specialized nodes that produce troubleshooting steps, escalation messages, or information lookups.
Tools: access to order systems, CRM, and knowledge bases for facts and account data.
Widgets: ticket summaries with suggested next steps and escalation buttons.

Sales assistant

Classifier: detect intent (pricing, demo request, product features).
Agents: generate tailored pitch decks, lookup latest discounts, recommend add-ons.
Tools: CRM and product catalog integrations to fetch customer history and real-time pricing.
Widgets: product cards and customized quotes for immediate presentation to prospects.

Internal knowledge assistant

Classifier: distinguish between policy questions, document lookups, and scheduling tasks.
Agents: retrieve snippets from internal docs, draft responses, or schedule meetings.
Tools: connections to internal search, LDAP for user directory, and calendar APIs.
Widgets: summarized policy cards that surface the relevant document excerpt and link back to sources.

Educational tutor or course assistant

Classifier: recognize question type (explanation, practice problem, feedback request).
Agents: produce step-by-step explanations, generate practice quizzes, or grade user-submitted answers.
Tools: access to curriculum content and student performance data.
Widgets: interactive quizzes and progress dashboards.

Each of these examples benefits from the clean separation of concerns that Agent Builder encourages: small agents with focused responsibilities, explicit schemas, connected tools, and polished presentation layers.

Security, ethics, and operational considerations 🔒

Deploying agentic workflows brings responsibilities beyond functional correctness. You must consider security, privacy, and the potential for unintended behaviors. Here are the key areas I emphasize in my deployments.

Data privacy and access control

Be deliberate about which tools you attach to an agent and what data those tools can access. If an agent calls internal APIs or databases, enforce strict least-privilege credentials and auditing. When working with personal data, make sure you comply with relevant regulations and organizational policies.

Prompt safety

Role prompts and system messages control agent behavior. Craft them with safety in mind: constrain the agent from making unauthorized assumptions, instruct it to refuse requests that require sensitive action, and provide fallback guidance when data is insufficient.

Monitoring and logging

Observability is critical. Log agent interactions, tool calls, and decision traces so you can diagnose problems and understand user behavior. Built-in evals help during development; runtime monitoring helps you detect drift or abuse once the agent is live.

Human-in-the-loop and escalation

For high-risk actions (e.g., making purchases, canceling subscriptions, or modifying user accounts), design the workflow to require human confirmation or an explicit verification step. The visual canvas makes it easy to insert approval nodes or escalation paths.

Bias and fairness

Any content-generation system can reflect biases present in training data. Evaluate outputs across demographics and user segments. Use evals and targeted testing to uncover problematic patterns, and iterate on prompts and data sources to mitigate issues.

FAQs and troubleshooting ❓

Below are some frequently asked questions and practical troubleshooting steps based on the travel agent demo and the kinds of issues I commonly encounter while building agentic workflows.

Q: My classifier is misrouting inputs. What should I do?

A: Start with the prompt. Make the classification task explicit and include examples if necessary. Switch to structured outputs (JSON) and verify in Run Preview what the classifier is returning. If misroutes persist, collect failing examples and iterate on the prompt or add a secondary check before final routing.

Q: The agent’s web search returns inconsistent flight information.

A: Web results can vary. Ensure your search tool configuration includes reputable sources and consider adding post-processing steps to normalize results (for example, parsing timestamps into a canonical timezone). Also verify that your agent extracts and formats data consistently, and consider caching or validation layers for crucial fields like prices or availability.

Q: My widget doesn't render correctly in the final product.

A: Verify the widget schema matches the agent output. Use Run Preview's widget preview to test input-output pairing. If you exported the widget, ensure the front-end renderer uses the same version. When in doubt, simplify the widget fields and reintroduce complexity progressively.

Q: How do I roll back a broken workflow?

A: Publishing creates versions. If a new change introduces regression, revert to a previous workflow version or republish the last known-good configuration. Maintain a change log for who changed what and when, so you can identify the root cause quickly.

Q: Can I test with real user traffic safely?

A: Use canary deployments or staged rollouts. Start with a small subset of users and closely monitor logs and key metrics. Implement rate limits and safe-fail behavior so that if something starts to go wrong the agent returns a neutral fallback message or routes to a human agent.

Conclusion and call to action 📣

Agent Builder is a practical step toward simplifying the creation, testing, and deployment of agentic workflows. During my walkthrough, I demonstrated how a single, visual canvas can be used to assemble a travel assistant that classifies intent, uses web search for fresh data, returns structured outputs, and renders results in a custom widget. The platform’s emphasis on explicit schemas, connected tools, and easy publishing paths — either through the Agents SDK or by embedding a workflow ID into ChatKit — makes it easier for teams to move from prototype to production.

As someone who builds and ships these kinds of systems, I find the combination of visual clarity and programmatic export particularly compelling. It frees teams from fragile orchestration code while preserving the flexibility to integrate deep, custom logic when needed.

If you want to get started, I recommend these immediate next steps:

Create a new workflow in Agent Builder and start from a travel template (or a use-case template close to your needs).
Define clear input schemas and prefer structured JSON outputs for any data that other nodes must consume.
Use Run Preview to iterate quickly and attach tools only when you need fresh or external data.
Design a simple widget for your most important outputs so stakeholders can visualize results before integration.
Publish to a limited audience first, using ChatKit or the Agents SDK, and run evals to measure success.

Finally, I welcome your feedback. I’m continually refining prompts, examples, and templates based on what developers tell me works best in the real world. If you try Agent Builder, test a few variations of the classifier prompt, explore Widget Studio, and let me know how it goes — your experiences help shape improvements.

"Basically, it's your all-in-one space to design, test, and launch AI agents visually and fast." — Christina Huang

Where to learn more

To explore Agent Builder, visit the platform documentation and try the visual canvas with a template relevant to your domain. If you’re integrating into a product, compare the Agents SDK with ChatKit to choose the integration path that best matches your engineering preferences.

Thank you for reading this report-style summary. I’m excited to see what you build with Agent Builder and how teams use visual design to unlock richer, more reliable agentic experiences.