Build Hour: Codex — One Agent for Everywhere You Code

Featured

🧭 Introduction: Why I’m writing this and who I’m with

I'm Dominik Kundel, Developer Experience at OpenAI, and I joined Pranav Deshpande for a live Build Hour to walk through Codex — our software engineering agent — and the recent updates that make it feel like a single agent you can use everywhere you code. In that session, Pranav and I covered what’s new, how Codex now fits across IDE, CLI, GitHub, mobile, and the cloud, and we demonstrated real-world workflows by maintaining the Agents SDK in TypeScript. This post recaps that session in depth and expands on the practical patterns, best practices, and mental models you can adopt to get the most out of Codex.

I wrote this article to be a hands-on guide: I’ll explain how Codex behaves now, show how I use it in the IDE and cloud, and give concrete recommendations for structuring a repository so Codex becomes a productive teammate rather than a fragile tool. If you watched the original session (hosted by OpenAI), this article reinforces the same ideas. If you didn’t, it’s a standalone walkthrough you can follow when you’re ready to try Codex on your codebase.

✨ Executive summary — what changed and why it matters

Over the last few months we’ve unified several experiences into a coherent Codex product. The major pieces that came together are:

  • IDE extension (VS Code-compatible and Cursor): Codex is now embedded directly alongside your code.
  • Revamped Codex CLI: Open source, regularly updated, scriptable, and useful in terminals and automation.
  • Cloud workspaces and handoffs: You can delegate tasks to secure cloud sandboxes and pull changes back into your local environment.
  • Automatic code review in GitHub: Codex can run reviews on PRs, execute tests if needed, and propose fixes.
  • GPT-5 integration: Codex leverages GPT-5 as its coding model by default for higher-quality reasoning, tool usage, and instruction following.

Put together and connected by your ChatGPT account, these updates mean Codex is less like a collection of separate tools and more like one intelligent software engineering assistant you can access from the editor, terminal, web, and mobile.

🔎 How to think about Codex now: Where you use it, where it runs

To form a working mental model, separate two dimensions: where you invoke Codex and where Codex executes.

Where you invoke Codex

  • IDE — If you live in your editor, use the Codex extension (VS Code-compatible). It brings the CLI harness into a familiar graphical context: highlights, in-place suggestions, and one-click apply of Git diffs.
  • CLI / Terminal — If you prefer the terminal or want scriptable automation, the Codex CLI continues to be the lightweight open-source agent harness many of you already use.
  • GitHub — Code review automation runs on GitHub and attaches reviews to PRs automatically.
  • Web / ChatGPT — For scheduling asynchronous tasks and long-run jobs you can review later.
  • Mobile (ChatGPT iOS app) — Great for capturing inspiration and kicking off tasks while away from the keyboard.

Where Codex executes

  • Local execution — Codex runs inside your machine (or a local environment) when you're pairing or iterating rapidly. This is the lowest-latency mode and it can directly execute commands, run tests, and modify your local files.
  • Cloud sandbox — Codex can run asynchronous tasks in secure containers in the cloud. These environments download your repository, run code, compile, run tests, and create PRs against your branch.

These two axes let you pick a workflow that fits the task and your comfort level. Pair programming and quick iterations are ideal locally; long-running refactors, migrations, or tasks you want to fire-and-forget are great candidates for cloud handoffs.

🛠 Demonstration highlights — how I use Codex on the Agents SDK

In the session I demonstrated an end-to-end flow using the Agents SDK (an open source monorepo we maintain). Below I’ll restate and expand on several live demos I performed and the lessons you can adopt immediately.

1) Quick repo exploration with chat mode

I started by asking Codex, in chat mode, "What is this repo about?" and more nuanced questions like "Which example demonstrates the most comprehensive Real-Time API demo?" Chat mode is read-only and uses the repository context to traverse files and surface summaries. It behaves like an engineer who skims the codebase and then answers based on multiple files it inspected.

Key takeaway: chat mode is excellent for onboarding, understanding code layout, and discovering which example or package is most relevant. If you want Codex to make changes, switch to agent mode.

2) Editing UI to capture continuous image input

I wanted one of our demos to capture an image every second instead of a single static snapshot (to simulate video input). Using the IDE extension in agent mode, I told Codex to "update the page file to allow for continuous image input at one frame per second, configurable to fall back to static mode." I didn’t prescribe the exact implementation details — I let Codex traverse the React components to figure out where to add a useEffect hook and how to wire the configuration.

What happened:

  • Codex modified the single file I targeted, adding state and a configurable FPS variable.
  • It produced a concise summary of the changes and gave me a diff I could inspect and apply.
  • I asked a follow-up to change the implementation approach (I didn't like the initial solution), and Codex iterated on the same file.

Tip: Start by restricting Codex’s edits to a small scope. That keeps changes focused and reduces surprise diffs. If you do need larger changes, use a plan (more below) and offload the task to the cloud if you expect long runtime or multiple attempts.

3) Running multiple tasks in the cloud simultaneously

Next I demonstrated kicking off several cloud tasks to update different demo apps to support MCP (an extension for Real-Time API compatibility). In the cloud handoff, I selected an environment (a container that codifies available tools and dependencies), targeted the main branch, and launched tasks. I used "best of N" mode (internally “best of end”), where multiple attempts run in parallel and you can pick the best resulting PR.

Why parallel attempts?

  • It reduces time spent prompt-engineering — instead of refining one prompt until perfect, fire off several and pick the best result.
  • It gives you options: different implementations, level of refactor, or API usage patterns.

When the tasks finished, each generated a PR that I could inspect, apply locally, or merge. You can also tell Codex to apply local changes when kicking off a cloud task (so the cloud worker starts from your exact working state).

4) Writing plans and breaking down complex UI changes

For larger shifts — for example a polished UI migration — I used Codex to create a plan.md. I fed an image (a mock or wireframe) and asked it for a step-by-step plan. Codex produced a comprehensive plan that I could iterate on as a review artifact. When satisfied, I kicked off a cloud task to implement the plan and let Codex create the Git diffs.

This plan-first approach is one of the most reliable ways to produce higher-quality work from an agent: it lets you validate the architecture before coding, and it enables parallelization of implementation tasks.

5) Generating Rust code from a Python spec

One memorable example: I had a late-night idea to stream tokens in our Harmony parser (a Rust project with a Python wrapper). I wrote a short Python spec describing the desired behavior and had Codex implement it in Rust. By morning, a PR existed with a working implementation. I reviewed the Rust code, made minor adjustments, and merged. The result saved me from having to become an expert Rust programmer — Codex handled the mechanical translation and integration.

Lessons:

  • Codex is excellent at cross-language implementation if you provide a clear spec.
  • Short planning prompts and a well-scoped desired interface work best.
  • Mobile capture of ideas is powerful — I used the ChatGPT iOS app to record the idea at 11:30 PM and queued the work remotely.

🧩 Modes and controls: chat, agent, and full access

Codex offers different modes, each suitable for a different level of trust and autonomy:

  • Chat mode — read-only. Great for getting context, asking questions, and discovering the right files and examples.
  • Agent mode — makes changes but asks for approval. This is my recommended day-to-day for pairing and iterating.
  • Full access (Yoga) — gives the agent permission to execute commands beyond the workspace (e.g., global environment changes). Use full access sparingly; it’s powerful but increases risk.

When starting with Codex in a codebase you don’t fully trust or know, start in chat mode, then move to agent mode with limited scopes. Only grant full access when you need the agent to run end-to-end integration or environment-wide commands and you’re confident in the configuration.

📁 Structuring your repository for Codex success

Codex is much more useful when your repository is structured for collaboration. I treat repository structure the same way I would optimize it for new human contributors. The critical design considerations are:

1) Clear, small packages and monorepo hygiene

If you run a monorepo, clearly name and separate packages (e.g., core, real-time, examples). Small, well-defined packages let Codex work on parallel tasks in different packages without creating merge conflicts. When tasks are confined to a package or small set of files, Codex can produce diffs that are easier to review and integrate.

2) Use TypeScript or typed languages for examples and SDKs

Codex can rely on compilation or type-checking as a type of test. For TypeScript projects, recommending Codex write tests and TypeScript compilation checks increases the likelihood that the proposed diffs are mergeable. Static checks are a great way to reduce trial-and-error loops.

3) Agents.md and repository-level conventions

Documenting how agents should behave in an agents.md file is one of the highest-impact investments you can make. Codex will look for agents.md to get repo-specific conventions, coding style, test expectations, and instructions on how to handle common patterns.

Make agents.md explicit about:

  • Which branches or prefixes to use for PRs
  • Whether to create tests for new features
  • Preferred libraries and airflow for reliability
  • Any destructive or environment-sensitive rules (e.g., do not run migrations on production DB)

Tip: You can nest agents.md files in subpackages or folders for different stacks within a mono repo. For complex codebases, multiple agents.md files tailored per package are extremely helpful to guide Codex’s behavior.

4) Tests and verification harnesses

Codex often writes its own tests where applicable. If your repository has tests, linters, and formatters, Codex can run them to check and fix issues. That gives you strong signals about whether a PR will be mergeable.

Practical measures:

  • Keep unit tests fast for the parts Codex is likely to modify.
  • Provide a dev environment that can be executed in the cloud sandbox.
  • Create small helper scripts that Codex can call to run smoke tests.

🚦 Practical workflows and best practices I follow

As I work with Codex day-to-day, I've developed several reliable workflows that keep the balance between delegation and control. Here’s my checklist and reasoning behind each step.

Workflow: The Delegation-First Loop

  1. Capture the idea immediately. When inspiration hits (mobile app, terminal, or editor), create a short plan or task description. I often use the ChatGPT mobile app to capture the idea and queue the work.
  2. Pick the execution context. Decide whether it’s a small change (IDE agent mode), a multi-file migration (cloud handoff), or just research (chat mode).
  3. Specify the constraints. Add important constraints in a short agent prompt or in agents.md: tests required, files to avoid, coding style rules.
  4. Set verification steps. Tell Codex to run tests, type-check, or run integration checks so it has an objective success criterion.
  5. Use best-of-N for exploratory or brittle tasks. If you’re unsure what the right implementation looks like, run multiple attempts (best-of-N). Choose the most appropriate PR and iterate.
  6. Pull down and review locally. Apply Git apply or use the IDE extension to bring changes into your workspace and run the final checks before merging.

Guidelines that save time and reduce risk

  • Scope narrowly at first: Ask Codex to change one file or small feature and only broaden scope when you trust the output.
  • Use plans for multi-stage work: Create plan.md artifacts for complex tasks, iterate on the plan, then ask Codex to implement the plan.
  • Require tests for substantial features: Agents.md can instruct Codex to write unit tests whenever it adds a substantial feature.
  • Prefer cloud for long-running or parallel tasks: Local tasks are great for tight loops; cloud tasks let you parallelize and don’t block your machine.
  • Keep your branches tidy: Use personal branches and Git workflows that make merging straightforward.

🤝 Code review automation — how Codex reviews and fixes PRs

One of the features I'm most excited about is Codex code review. When enabled in GitHub, Codex automatically reviews PRs. But it does more than static analysis — it can run code, understand the diff in the context of your codebase, and validate whether a PR actually implements its stated intent.

What Codex does in a code review:

  • Reads the PR title and description
  • Inspects diffs and the impacted areas of the codebase
  • Runs tests and type checks if necessary
  • Generates meaningful review comments with fix suggestions
  • Optionally creates a follow-up PR that applies fixes described in the review

Example from my workflow: A colleague opened a PR (I hadn’t reviewed it yet) and Codex had already posted comments identifying an edge case. Instead of opening the PR and letting the issue linger, Codex kicked off a follow-up task that created a fix branch. The colleague and I reviewed the fix and merged. It saved time and caught something I might have missed.

📲 Mobile-first workflows: capturing inspiration anywhere

I often capture ideas away from my desk and kick off tasks from my phone. The ChatGPT mobile app is ideal for quick prompts like:

  • "Add streaming tokens support for Harmony parser — spec attached."
  • "Create a plan to add FPS configuration and continuous image capture to demo X."
  • "Run a security-focused code review of PR #1234."

Because Codex can run tasks asynchronously in the cloud, I can queue a job on my commute and arrive at the office with a draft PR to review. This reduces cognitive load (no “don’t forget” mental bookmarks) and leverages otherwise dead time productively.

🔐 Security, review scope, and limiting agent access

Security and correctness are top of mind. A few practices I recommend:

  • Use agent modes carefully: Start with chat and agent mode; grant full access only when needed and audited.
  • Pin environments: Use reproducible environments and container images for cloud tasks so the agent’s runtime is stable.
  • Explicitly request security reviews: Codex is flexible; you can ask it to focus on security in a code review. The automatic reviewer typically infers security-related checks, but explicitly tagging the request improves focus.
  • Limit external side effects: Use sandboxed cloud runs and avoid giving agents credentials to production systems.

📚 Real examples and anecdotes from my work

Three short, concrete case studies from the session and my day-to-day work show how these ideas play out:

Case study 1 — Real-Time demo parity across examples

Problem: We added MCP support for the Real-Time API but hadn’t updated all demo apps to reflect parity. Instead of manually changing each demo, I launched a cloud task to update Twilio and another to update other demos. I used best-of-N to get multiple candidate PRs and applied the best one. Result: All demos were updated in a fraction of the time it would have taken me to individually edit and test each repository.

Case study 2 — Continuous image capture for a UI demo

Problem: A demo used single-frame image capture; I wanted a mode that captures a frame per second for near-video analysis. I used the IDE extension in agent mode to scope edits to the page file, asked for a configurable FPS variable, and iterated with a follow-up change. I kept changes confined initially, tested locally, and only when satisfied did I create a PR. Result: The demo gained a new feature with minimal manual editing.

Case study 3 — Harmony parser streaming addition

Problem: Add a streamable token wrapper to the Harmony parser (Rust with Python wrapper). I wrote a Python-level spec into the mobile app at night, then kicked off a cloud task. Codex implemented the Rust changes, compiled, and opened a PR. I reviewed and merged the PR in the morning. Result: I avoided learning Rust in depth and still shipped a robust change.

🧠 A necessary mindset shift: think like an EM or architect

One important cultural change I recommend is rethinking your role when using Codex. Instead of being a developer who writes each line, think like an engineering manager or architect who:

  • Creates well-scoped tasks
  • Chooses the right verification steps
  • Orchestrates multiple parallel tasks
  • Validates and reviews the outputs

Why this matters: Codex excels at executing well-defined tasks. If you’re only thinking about the immediate implementation, you might miss opportunities to parallelize, offload, or create reusable plans. By raising your abstraction level — specifying constraints, verification, and desired interfaces — you multiply your impact.

🔧 Practical CLI and IDE integration notes

Couple of operational tips to make switching between IDE and CLI smoother:

  • The Codex CLI is open source and regularly updated. It supports a flag to change the model (e.g., -m) if you want to experiment with alternatives to GPT-5. In general, GPT-5 gives the best results for tool usage and coding tasks.
  • The IDE extension and CLI share configuration, so you don’t need to duplicate settings between them.
  • Use codex exec in scripts for automation: the CLI can be scripted to run commands and act as part of a CI pipeline or developer tooling.
  • If you use local MCP servers for verification, you can configure the cloud environment to call those servers so the agent has access to the same verification utilities.

🔎 Frequently asked questions I covered live

Below are distilled answers to common questions that surfaced during the live Q&A.

Can multiple agents work on the same codebase concurrently without causing chaos?

Yes — if you structure the repo smartly. Keep changes confined to specific packages or files, use branches or git worktrees locally, and prefer cloud tasks for parallel jobs. Codex tends to limit changes to the files you ask it to, and Git workflows (small PRs, clear package boundaries) minimize merge conflicts.

How does code review handle security checks?

Codex’s review feature is flexible. The automatic reviewer infers what to check, but you can explicitly request a security review (e.g., tag a PR with "@Codex review for security"). The reviewer can run tests, static analysis, or targeted checks as needed.

Can I run Codex with models other than GPT-5?

Yes. In the CLI you can pass model flags (-m) to use other models, including O3 or self-hosted OSS models. That said, GPT-5 is recommended for most coding tasks due to quality and tool usage enhancements.

Does Codex provide hooks or notifications when a cloud task finishes?

At the time of writing, pre- and post-hooks and open telemetry are not universally available, though the team has heard requests for these features. For now, you can watch tasks in the web UI or connect the results through GitHub PR workflows.

Is the IDE extension backed by the same harness as the CLI?

Yes. The IDE extension uses the same agent harness as the CLI. The extension mainly provides a richer UI with buttons and drop-downs, while the CLI remains scriptable and lean.

📘 Where to start: a checklist to adopt Codex today

If you want to adopt Codex for your project, here’s a practical checklist that mirrors how I onboard repositories:

  1. Create a small agents.md in the repo root (or use the CLI helper to initialize one).
  2. Ensure the repo has basic test coverage and a quick smoke test script for examples.
  3. Install the Codex CLI and sign in with your ChatGPT account (Plus/Pro/Business/Edu access may be required).
  4. Add the Codex IDE extension in your editor of choice (VS Code-compatible or Cursor).
  5. Run a few read-only chat mode queries to validate repo access and agent context.
  6. Try a tiny agent mode change: ask Codex to implement a small feature in a single file and review the diff.
  7. Experiment with a cloud task for a medium-sized job that you don’t want to block on locally.
  8. Enable Codex code review in GitHub and watch how it reviews incoming PRs.

📎 Resources and artifacts I recommend

To make this practical, here are resources and patterns I pointed to during the session and that I recommend you consult:

  • Agents SDK repo — A real codebase with public PRs created by Codex. Inspect the PRs with "Codex" tags to see real examples of generated diffs and fixes.
  • Codex docs — The official developer docs for installing and configuring the CLI and IDE extension.
  • agents.md — The standard for repository-level agent instructions and conventions.
  • Codex CLI repo — The open-source CLI harness, useful for advanced scripting and automation.
  • Build Hours sessions — Regular live events that dive into OpenAI APIs and developer tooling.

🔁 Closing thoughts: what I see next for developer workflows

Working with Codex has shown me how software development can change when you treat an AI agent as part of the engineering team rather than an exotic tool. A few observations from my day-to-day:

  • Lower barriers to cross-language work: I can sketch behavior in Python and let Codex implement in Rust or TypeScript. This dramatically reduces the friction of multi-language projects.
  • Faster iteration and fewer forgotten todos: Instead of writing a TODO comment and hoping to get to it later, I can kick off a Codex task and check it off my mental list. "Out of sight, out of mind" becomes "out of mind, into Codex".
  • More inclusive contribution model: Designers, PMs, and non-core engineers can propose changes that Codex implements as first drafts, which speeds discovery and reduces bottlenecks.
  • New role patterns: The best operator is one who orchestrates: they plan, set verification criteria, and review outputs — rather than typing each line themselves.

Codex isn’t here to replace engineers. It’s here to expand what teams can accomplish and to let engineers operate at a higher abstraction level. If you treat Codex as a teammate — with clear instructions, tests, and repo structure — it becomes a multiplier.

📣 Final notes and next steps

If you’re curious to try these ideas yourself, start with a small pilot: pick an example or demo, add an agents.md with a couple of guidelines, and try the IDE extension’s agent mode for a small change. Then experiment with a cloud task for a medium-sized migration and enable Codex code review to see how it surfaces issues on PRs.

We’ll be running more Build Hours and live events that dive into the Responses API, GPT-5 patterns, and deeper Codex workflows. If you’ve already experimented with Codex, I’d love to hear about what worked and what didn’t: the tools are evolving fast and real-world feedback continues to shape the features teams are building next.

"Codex should now feel more and more like one agent for everywhere you code." — Pranav Deshpande

Thanks for reading. If you try any of these workflows and want help shaping your agents.md or scoping your first cloud task, reach out or capture the idea in the mobile app and let Codex run the first draft — you might be surprised how often it gets you 80% of the way there.


AI World Vision

AI and Technology News