AI coding with Gemini CLI — Google's terminal agentic coding tool

I'm Mark McDonald from the DeepMind Developer Relations team, and I put together an in-depth walk-through of the Gemini CLI for Google for Developers. In this piece I report on what Gemini CLI is, how it works, and how you can adopt it in your daily dev workflow. If you've ever wished your terminal could think like a collaborator—search your repo, run commands, edit files, check tests, and iterate until a task is done—this is for you.

Treat this article as a live news-style briefing and hands-on guide. I'll cover the core ideas, the concrete commands and options, the safety features and guardrails, how we built the tool, and practical scenarios where Gemini CLI shines. I'll also answer frequently asked questions based on what I hear most from the community.

🧭 What is Gemini CLI — quick briefing
🔐 Getting started and authentication
📁 Gemini MD files and project-centered configuration
🛠️ Tools and built-in commands
✍️ Editing, diffs, and the agent loop
🛡️ Sandboxing and safety features
🔌 Extensions, MCP servers, and custom tooling
🤖 Headless mode and automation
📈 Telemetry and observability
💾 Checkpointing, chat save & restore
🧠 Context compression and large-window strategies
⚙️ How the Gemini CLI is built
🌐 Open source, community and roadmap
💡 Practical workflows and example scenarios
🔥 Tips, best practices, and pitfalls
❓ FAQ
🔚 Wrapping up — my final report

🧭 What is Gemini CLI — quick briefing

Gemini CLI is an interactive command-line tool that brings the Gemini model directly into your terminal. At its heart it's an agentic system: you give it a goal and it iterates toward completion by calling a structured set of tools. Unlike a stateless single-prompt request, the CLI runs an agent loop so the model can plan, act, observe, and continue until a task is complete.

The CLI is designed for software engineering workflows. It is project-oriented (you run it from a project directory), context-aware (it knows about your OS, can run binaries, and read files in the current project), and multimodal (it can process text, images, PDFs, audio, and video). In practice that means I can ask it to find the code that sets my terminal TTY window title, change it to include a Unicode star, run tests, and prepare a PR—all from the terminal where my repo lives.

Here are the main capabilities in one take:

Agent loop: iterative planning and execution until a goal is met.
Context awareness: knowledge of OS, project files, and the ability to run local binaries like ls, jq, ffmpeg, and more.
Project-oriented: by default it only exposes the current project files and directory structure.
Multimodal: supports text, source code, PDFs, images, audio, and video for richer tasks.
Extensible: slash commands, MCP integrations, and user extensions allow you to add custom tools and workflows.
Safety features: sandboxing, checkpointing, and request approvals to limit risky operations.

🔐 Getting started and authentication

Getting the CLI running is straightforward. From a fresh install, run gemini in your terminal. The first time you run it you'll see an authentication dialog. Most users sign in with Google (using the browser flow), which also gives you up to 1,000 free API calls per day for Gemini 2.5 Pro Flash in the default cadence.

If you'd rather use a dedicated API key, you can set the Gemini API key. This is handy for centralized build or CI accounts so you can keep model access consistent across automation. If your organization already uses Google Cloud and Vertex AI, you can configure the CLI to use your Vertex credentials so model billing goes through your existing account.

The authentication choice mostly affects billing and backend. The model, prompts, and tooling experience are the same regardless of whether you use a Google sign-in, Gemini API key, or Vertex AI integration.

First run

Open a terminal inside your project directory (a workspace or a checked-out Git repo).
Run gemini.
Follow the browser-based sign-in flow or configure the Gemini API key / Vertex credentials.

Once authenticated, the CLI introspects the directory and notes any Gemini MD files that exist. These are reusable instruction files you can check into a repo to capture your engineering conventions, build commands, test instructions, and team preferences. More on those in the next section.

📁 Gemini MD files and project-centered configuration

One of the most powerful patterns with the CLI is the use of Gemini MD files. These are plain text files that contain instructions and conventions the model should follow when working on that project.

I keep a Gemini MD at the root of the Gemini CLI repository so every contributor and every session starts with the same expectations: how to build, how to run tests, which linters to use, commit style, reviewing conventions, and so on. The model reads those instructions and applies them. The practical benefit is consistency—PRs generated by the model should match your project's engineering style.

Gemini MD files are tiered and hierarchical. The CLI looks up the directory tree for them, and will also consider Gemini MD files in subdirectories. Typical patterns:

Root Gemini MD: project-wide conventions (build, test, Git rules).
frontend/Gemini.md: front-end specific guidance (React testing approach, linters).
backend/Gemini.md: backend specific instructions (dependencies, database migration steps).
~/.gemini/Gemini.md: global user preferences (emails, slide generation defaults, CLI tools I prefer).

Why tiering matters: if I'm working exclusively in frontend/components, I don't want backend instructions to clutter the model's context. By placing a Gemini MD file inside frontend/, the model only sees the frontend rules when running from that folder.

Example content in a Gemini MD

Build commands: NPM run pre-flight, make test, etc.
Testing rules: which frameworks to use, mocking strategy, React testing patterns.
Commit style: conventional commits, PR message requirements.
Personal preferences: which tools to call (gh CLI vs raw git), how I like emails written, slide templates to use.

I also use Gemini MD to persist small memories or preferences. The CLI exposes a "save memory" tool the model can call. If I tell it "remember that I like to use conventional commit syntax," the model will actually append that to my global Gemini MD. From that moment onward the model keeps that preference in future sessions. It's a simple but effective way to preserve your personal engineering style.

🛠️ Tools and built-in commands

The Gemini CLI bundles a set of tools that the agent can call. Tools are invoked by the model during the agent loop and they encapsulate common operations. Many are the things you'd expect—read files, edit files, grep/search—but there are also web tools, memory tools, and integrations.

Important built-in tools:

search_text: effectively a repo-wide grep to find candidate files or code snippets.
read_file and write_file: read or mutate files in the current project directory.
edit: generate diffs and propose edits which you can approve, modify in an external editor, or reject.
web_search: uses Gemini's Google search feature to fetch web results for up-to-date library documentation.
web_fetch: fetch arbitrary URLs; falls back to local curl or wget when necessary.
save_memory: append items to your Gemini MD files for later retrieval.

Commands in the interactive interface start with a slash ("/") and there are many helper commands. Type /help inside the session to get the list—this is extensible, so you or your team can define custom slash commands.

Shell mode—exclamation prefix (!)

Use the exclamation prefix to run local shell commands without sending that text to the model as a prompt. For example, !ls -la or !git status. The output is displayed in the session and captured into the model's context. That means if I list a directory or cat a file, the model can reference that output later in the conversation.

This is very useful for bridging the "manual" command line world with the agent's mental model. Rather than copying outputs by hand into the chat, execute the command and let the model see it. The output is part of the context for subsequent planning or edits.

File referencing using @ syntax

You can reference local files explicitly with an @ prefix so the CLI will include file contents in the prompt. For example, asking "Explain @docs/design.md" will cause the file to be read and its content included. The CLI provides file autocomplete for convenience because it already knows the directory structure.

✍️ Editing, diffs, and the agent loop

The editing experience is central to the CLI. The agent loop was designed particularly for code changes: identify the relevant files, propose edits as diffs, ask for approval, and then apply or iterate on the edits.

A typical editing flow:

You ask a question or give the agent a task, for example: "How is the TTY window title set?"
The model runs search_text to locate candidate files and loads them with read_file for context.
The model answers your question using that file context. If you ask it to make a change (e.g., "Add a Unicode star to the title"), it generates a diff and propose edits.
You review the suggested diff. You can approve, decline, or ask for a rework. Approving applies the change; declining provides feedback to refine the suggestion.
If you want to tweak the proposed change manually, you can open it in your editor (VS Code, Vim, etc.) and then save. The CLI will pick up the result and continue.

One of the design priorities is to make review easy. The model tries to explain its changes and calls out assumptions. If anything is ambiguous, it asks clarifying questions instead of making dangerous guesses.

You have the option to "always allow edits" for sessions you trust. This is useful when you want a highly automated workflow, but I generally recommend caution: guardrails like sandboxing and checkpointing are valuable for avoiding accidental damage.

🛡️ Sandboxing and safety features

When you let a model run commands and edit files, safety matters. The Gemini CLI provides several layers of defense to limit possible risks:

Project-level soft sandbox: by default the CLI restricts the model tools to operate within the project directory the tool was launched from. This prevents casual access to your entire file system.
OS-level sandboxing: on macOS we integrate with the seatbelt system to contain the binary; this prevents it from calling other binaries or accessing arbitrary files outside the permitted set.
Container sandboxing: Docker and Podman are supported for containerized sessions. You can run Gemini inside a container with an explicitly configured set of tools and files.
Tool permission requests: for mutating operations, the CLI asks for user approval unless you explicitly enable an auto-approve flag.
Checkpointing: automatic snapshots of file changes create a safety net so you can roll back undesirable edits.

Combine sandboxing and checkpointing for a strong safety posture. Sandboxing limits what the agent can access, while checkpointing gives you a reliable way to restore state if the agent does something unexpected.

How to enable sandboxing

Run gemini --sandbox to enable the default sandbox mode.
On macOS the CLI uses seatbelt by default for a tight OS-level sandbox.
For broader customization, set sandbox options in the global settings JSON file (more on settings.json below).
To use containerization, set --sandbox docker (or configure Podman).

🔌 Extensions, MCP servers, and custom tooling

One of the CLI's strengths is extensibility. There are two main extension patterns I use: MCP servers and Gemini extensions.

MCP (Model Control Plane) servers let the agent fetch dynamic, up-to-date information or domain-specific tools. For example, a Context 7 MCP can retrieve fresh API docs on demand. I use several MCPs in my demo:

Context 7 — pulls API documentation and keeps the agent from guessing outdated APIs.
Google Slides MCP — lets the agent create and update Google Slides via the Slides API.
My personal MCP that returns Gemini API docs to the agent.

MCPs are configured in your settings.json and can be invoked by the agent. They are a great way to plug in vendor APIs, internal doc sources, or domain-specific knowledge that the model shouldn't have to memorize or infer.

Gemini extensions

Extensions are a bundle of configuration and optional helper scripts. An extension file includes metadata, MCP server definitions, customizable slash commands, and optional context files. You can share extensions with your colleagues or publish them for your team to adopt.

Examples of what an extension can provide:

Custom slash commands for deployment flows.
Predefined prompt templates for recurring tasks (e.g., "Create a PR for this issue").
Saved context files that describe internal APIs or security rules for the agent to follow.

Extensions use the same JSON structure as the settings file, which makes them easy to compose and maintain.

🤖 Headless mode and automation

Headless mode lets you run a request non-interactively from the command line. This is perfect for cron jobs or scheduled automation: generate weekly status emails, produce changelogs, or run repository health checks automatically.

Example use case: I run a shell command that says "Summarize my last week's Git logs and write an email to team@company.com." The CLI will create the summary and save or send the email wherever you configured it to write. The headless mode still respects the same safeguards—if a step requires file writes or tool permissions the CLI will prompt (or you'll use an auto-approve flag to allow it).

I sometimes turn on a "YOLO" auto-approve in demos to show the full automation path, but in production you'd normally be more conservative.

📈 Telemetry and observability

If you want to monitor the agent's usage across your organization, the CLI supports OTEL (OpenTelemetry) endpoints. You can configure it to write telemetry to a local OTEL server or to a cloud-hosted backend like Google Cloud's telemetry services.

The telemetry payload contains useful information for auditing and analysis:

Which model instance was used for each request.
Timing and latency statistics.
Token usage per request.
Tool calls and status.
Full request logs if you enable them (useful for debugging, but sensitive—treat logs like source code).

Running a local collector during demos shows JSON records of the interactions. For teams, configuring telemetry into a centralized observability system is a good idea for accountability and performance tuning.

💾 Checkpointing, chat save & restore

Checkpointing is a safety mechanism that automatically snapshots file states into a shadow Git repository whenever a mutating tool (like write_file) is invoked. This gives you a reliable fallback so you can restore files to their prior state if the agent's changes are unwanted.

Checkpoints are particularly useful during long agent sessions that alter multiple files. If a later step fails or produces bad behavior, restore the checkpoint and re-run the step with adjusted instructions.

The CLI also supports saving chat sessions and restoring earlier conversations. This is important for workflows with branching plans—try Plan A, then revert and try Plan B without losing the prior discussion. It lets you move between alternatives and keep a persistent audit trail of the agent's thinking.

Practical checkpoint example

Enable checkpointing with gemini --checkpointing or set "checkpointing": true in settings.json.
Ask the agent to "replace README with jokes". The agent will propose changes and then take a checkpoint before mutating files.
If you don't like the result, run the restore command to bring the previous file back.
Re-run the agent with a different instruction.

Checkpoints and sandboxing together are a powerful defensive combo—sandboxing limits access and checkpointing gives you the ability to recover from mistakes.

🧠 Context compression and large-window strategies

The Gemini 2.5 models provide very large context windows—up to a million tokens—so the CLI can work with big codebases and long conversations. Even so, code and file systems can create very large contexts quickly. To avoid hitting practical limits, the CLI implements context compression.

When the transcript and file contents would exceed the model's context budget, the CLI can compress older parts of the conversation and less relevant file contents. The compression process attempts to preserve the high-level meaning—plans, intent, and important constraints—while discarding low-value implementation detail. The model is trained with a compression prompt that guides that process.

I usually recommend taking a manual snapshot of the chat (save a restore point) before compressing, since the compression can be fairly aggressive. If you need details the compression removed, restore the snapshot and do a refined compression strategy or selectively re-read files.

Compression workflow

The CLI shows context usage in the UI (tokens used in the bottom-right).
When approaching limits, the CLI prompts to compress automatically, or you can run compress manually.
Compression runs in the background and reports token savings.
If compression removes something important, restore a saved chat snapshot and try selective compression.

⚙️ How the Gemini CLI is built

One surprise for some people: the CLI is a Node.js app using React-style components. We use a library called Ink to render web-style components as terminal UI elements. That lets us build a rich, dynamic TUI (terminal user interface) using familiar React paradigms—components, props, and state—but output it as text in your terminal.

The codebase is split into two layers:

core package: contains the agent loop, tool orchestration, and general logic for interacting with Gemini.
cli package: contains the terminal interface and user-facing components. This separation makes it easy to reuse the core agent in other interfaces—desktop apps, web UIs, or integrated IDE plugins.

Everything is open source on GitHub, and that includes the system prompts and the compression prompt. If you're curious about the exact language used to guide the model, the repo exposes the core system prompt and the first-turn context prompt so the community can learn and propose improvements.

System prompt highlights

The main system prompt contains the agent's mission and rules. Key points include:

You're an interactive agent that does software engineering tasks.
Confirm ambiguity—ask clarifying questions instead of guessing.
Explain your changes and confirm design decisions.
Prefer existing project conventions; do not re-architect unless asked.
Follow sandbox and git-usage rules when present.

The first-turn context prompt tells the model about the environment: today's date, OS, and a trimmed view of the directory structure. The CLI intentionally avoids dumping large, noisy directories (node_modules, virtualenvs, gitignored files) to keep the first-pass context meaningful and light.

🌐 Open source, community and roadmap

We launched the repo as open source and the reception was massive. Stars and pull requests surged—it's been exciting and a little overwhelming. The project roadmap and future roadmap items are public so contributors and teams can see what we're planning and provide feedback.

Community channels include GitHub Issues, Discussions, and a public roadmap file. We're very receptive to contributions and to community-driven extensions. Because we see the CLI as a platform for agent-based engineering, we expect to see many third-party MCPs and extensions that connect to internal APIs, CI systems, or specialized domain knowledge stores.

💡 Practical workflows and example scenarios

Here are concrete workflows where Gemini CLI adds immediate value. I include the steps and the expected behavior so you can reproduce these in your environment.

Scenario A — Fix a bug across multiple files

Run gemini in your project's root.
Ask: "There's a bug where the login form loses the CSRF token. Find the cause and propose a fix."
The agent uses search_text to locate relevant code and loads candidate files using read_file.
The agent proposes a diff and explains the reasoning in a comment. You review.
Approve the change and the agent applies it. Optionally run tests: !npm test or ask the agent to run them for you.
If tests fail, the agent will propose further edits or ask for guidance.

This is the sort of loop I use frequently: identify, edit, test, iterate.

Scenario B — Create a feature branch and PR

In the CLI: "Add a 'dark mode' toggle to the settings panel and create a PR."
The agent reads project conventions from Gemini MD (how to name branches, commit message formats).
It generates the necessary code changes, runs local linters/tests, commits with the right message, and creates a PR via the GitHub CLI (if authorized).

Because I prefer gh CLI over raw git remotes, I include that preference in my Gemini MD. The agent follows my chosen workflow.

Scenario C — Generate slides and an email summary

Ask: "Summarize our last sprint and create a 5-slide deck for the stakeholder meeting."
The agent retrieves relevant notes and PRs, synthesizes an executive summary, and calls the Google Slides MCP to create the deck.
It produces an email draft and writes it to a file or places it on the clipboard depending on your config.

This demonstrates the multimodal and MCP-driven power—slides and emails are practical outputs for non-code tasks.

🔥 Tips, best practices, and pitfalls

Over time, I developed a set of guidelines that make using Gemini CLI productive and safe:

Run from project directories: the CLI is project-aware. Don't run it from your home root if you don't want your entire disk considered.
Use Gemini MD to encode conventions: the more you capture as machine-readable instructions, the less the model has to infer.
Prefer sandboxing for unknown repos: when experimenting on a repo you didn't write, enable sandboxing and checkpointing.
Save common tasks as slash commands or extensions: repetitive flows (deploy, changelog, test report) become single commands.
Use MCPs for API docs: don't let the agent guess interfaces—point it at fresh docs via a context MCP.
Audit telemetry carefully: telemetry is helpful, but logs can contain secrets. Manage those outputs like you manage code.
Keep an audit trail: save chat sessions when doing large changes so you can revert or analyze decisions later.

Common pitfalls to watch for:

Overly permissive auto-approve in production can lead to accidental changes.
Forgetting to configure sandboxing on shared machines exposes files unintentionally.
Compressed contexts can remove details you later need—take snapshots before aggressive compression.
Relying on model memory without explicitly saving critical rules in Gemini MD may lead to inconsistent behavior across sessions.

❓ FAQ

How is the Gemini CLI different from gemini.google.com or other Gemini tools?

I created the CLI as an agent that can act on your local files and shell. Unlike web-based interfaces, the CLI is context-aware, can run local binaries, and is designed to be project-oriented. It includes agent tooling—tools, sandboxing, checkpointing, MCP integration—which tailors it to software engineering tasks directly in your development environment.

Which model does the CLI use?

By default, Gemini 2.5 Pro Flash is available as the model target for the CLI. The settings allow you to specify the exact model you want to use, and billing depends on whether you're using a Gemini API key, Google account, or Vertex AI integration.

How are my files protected? Can the agent access my whole machine?

By default, the CLI limits access to the current project directory. You can use sandboxing (OS-level seatbelt on macOS, or Docker/Podman containers) for additional containment. I also recommend enabling checkpointing for mutating actions so you can always restore a previous state.

Can the agent commit changes to my Git repo and open PRs?

Yes. If you configure credentials or use the GitHub CLI (gh) and allow the agent to run it, the agent can create branches, commit with your conventions, and open PRs. These actions still ask for approval unless you enable auto-approve—be mindful of permissions in automated environments.

What about private company APIs and data?

Use MCP servers to put internal docs behind a controlled service. The agent will call the MCP to retrieve the data rather than having to embed private documentation into the model. Keep telemetry and logs inside your own observability systems to avoid leaking sensitive prompts to external services.

Can I run the CLI in CI/CD?

Yes. Headless mode is designed for non-interactive runs. For CI you can provide the API key or Vertex credentials and configure auto-approvals for specific tool calls. Still, prefer sandboxing and checkpointing in CI runs, and limit the scope of repository changes the agent can make.

Is the CLI open source? Can I see the system prompts?

Yes. The codebase is open source and includes the system prompts, compression prompt, and the first-turn context prompt. You can examine and propose changes via GitHub.

How do I add my own slash commands or extensions?

Create an extension file containing your commands, MCP config, and optional context files. Drop it into the extensions directory or reference it in your settings.json. Slash commands are flexible: they can run scripts, call MCPs, or orchestrate complex multi-step flows.

How does context compression work and what should I watch out for?

Compression summarizes older turns and unimportant details to reclaim tokens. It tries to preserve the intent and plan while dropping implementation-level noise. Because it can be aggressive, take a chat snapshot before compressing if you might need full history later.

What about billing and limits?

Billing depends on the authentication method: Google sign-in (with included free calls), Gemini API key, or Vertex AI. Token usage is recorded in telemetry; watch token consumption for large multi-file workflows. Use context compression to manage extremely large sessions efficiently.

🔚 Wrapping up — my final report

The Gemini CLI is a practical step toward agentic developer tooling. It isn't a replacement for engineers—it's a trusted collaborator that excels at repetitive tasks, context-aware searches, pattern-driven changes, and orchestrating multi-step flows that humans used to do manually. Because it can run local binaries, access your project files, and persist preferences in Gemini MD files, it can speed up feature development, bug fixes, and content generation.

But with that power comes responsibility. Use sandboxing, checkpointing, and cautious auto-approve settings to reduce risk. Store sensitive docs behind MCP servers and audit telemetry appropriately. Treat the model as an assistant that needs clear instructions and guardrails, and it will repay that clarity by being more predictable and useful.

We're committed to keeping the project open and evolving with community feedback. If you're curious, check the public roadmap and the GitHub repo—there's a lot to explore and many ways to contribute.

"Hi everybody, my name is Mark. Yeah, I'm in the DeepMind Developer Relations team." — Mark McDonald

If you want to get hands-on, try this: clone a small test repo, add a simple Gemini MD with a few preferences, and run gemini. Ask it to find a string in your codebase and propose a one-line change. Review the diff, approve it, and then enable checkpointing to practice the restore flow. You'll quickly see how agentic loops change your dev workflow.

Thanks for reading my report. If you try the CLI or build an interesting extension or MCP, I'd love to hear how you used it and what improvements you'd like to see next.

Table of Contents