Introduction
The browser was the bottleneck
Section titled “The browser was the bottleneck”Claude Code gave us the magic to collapse days of work into hours — in the terminal. But the moment you open a browser, the magic stops. You’re back to doing things by hand, clicking through pages, pretending this is fine.
The AI was never the problem. The browser was. Existing browsers weren’t built for agents to control. So we forked Chromium from source and built one that is.
Compared to automation tools
Section titled “Compared to automation tools”The ecosystem has good tools. Each solves part of the problem.
| Tool | What it does | Where it stops |
|---|---|---|
| Playwright | Best-in-class browser automation. Headless or headed, fast, reliable. Can persist storage state between runs with manual setup. | No AI agent loop. No MCP. Extension support requires headed Chromium and manual configuration. You write every script yourself. |
| Puppeteer | Chrome DevTools Protocol wrapper. Lightweight, scriptable. | Narrower API than Playwright. No agent integration. Same manual-scripting model. |
| Browseruse | Connects LLMs to a browser via Playwright. AI decides what to click. Can run headed or headless. | No persistent sessions by default. No extension support. No built-in human handoff. Limited real-time visibility. |
| Stagehand | AI-powered browser automation framework by Browserbase. Smart element targeting. Can use local Playwright or Browserbase cloud. | No MCP. No persistent local identity. No extension interaction. No human handoff. |
| Selenium | The original. WebDriver-based. Massive ecosystem. | Brittle selectors. Detectable by anti-bot (navigator.webdriver). No AI integration. |
These tools share a fundamental architecture: the AI agent sits outside the browser and pokes it through an automation API. The browser is a black box that receives synthetic events.
That works for simple page reads and form fills. It doesn’t work when you need real sessions, real extensions, native input control, or long-running tasks with human collaboration.
Compared to AI browsers
Section titled “Compared to AI browsers”There’s a newer category: browsers that add AI features. Atlas, Dia, Chromate — they build AI into a browser designed for humans.
UseBrowser is different. It’s not a browser with an agent. It’s a browser for an agent.
| AI browsers (Atlas, Dia, Chromate) | UseBrowser | |
|---|---|---|
| Primary user | You. AI assists your browsing. | The AI agent. You assist when needed. |
| Architecture | Standard browser + AI sidebar/copilot | Chromium fork with agent control at the engine level |
| Agent control | Limited — AI suggests, you execute | Full — agent controls cursor, keyboard, navigation, extensions |
| MCP / external agents | No. Their AI, their interface. | Yes. Any MCP-compatible agent can connect. |
| Programmability | Use the features they ship | Skills, CLAUDE.md, Playwright scripts — you shape how the agent works |
| Human handoff | Not applicable — you’re already there | Built in. Agent pings you via Telegram when stuck. |
| Terminal | No | Claude Code built in, pre-configured |
AI browsers add intelligence to your browsing experience. UseBrowser gives the AI agent a browser it can actually operate. You watch, you guide, you step in when needed — but the agent drives.
What we built
Section titled “What we built”UseBrowser is a Chromium fork where agent control is built into the engine — not bolted on, not wrapped around, not injected via extension.
Native control
Section titled “Native control”This is the core difference. UseBrowser gives agents access to the browser’s internals that wrappers and extensions can’t reach:
- Input simulation at multiple levels — agents can control the cursor and keyboard through CDP
Input.dispatchMouseEventfor in-page interactions, or through native OS events (CoreGraphics on macOS) for system-level input. This is what makes CAPTCHA solving work — a Bezier-curve mouse drag with realistic noise and velocity that closely mimics human hand movement. - Full CDP access — 22 curated MCP tools cover navigation, interaction, extraction, recording, and human handoff. For anything beyond those,
execute_playwrightgives agents programmatic access to the full Playwright API withpage,browser, andcontextpre-bound. - Extension access — agents interact with extensions (MetaMask, password managers) the same way you do — through their actual UI. Not mocked, not stubbed.
Architecture
Section titled “Architecture”The browser launches a Python MCP server as a child process. The server bridges MCP (SSE on port 9225) to the browser’s CDP interface (port 9222):
AI Agent ──MCP/SSE──→ localhost:9225 ──CDP──→ UseBrowser (Chromium fork)The MCP server is managed by the browser — launched on startup, killed on shutdown. Claude Code in the built-in terminal is pre-configured to connect. Any external MCP-compatible agent can connect to the same endpoint.
What that means in practice
Section titled “What that means in practice”- Persistent sessions — real cookies, real logins, real browsing history. Your agent picks up where you left off tomorrow.
- Built-in terminal — Claude Code runs inside the browser with pre-configured MCP access. No setup, no wiring.
- Real-time visibility — you watch the agent work in a real browser window. Not a headless void.
- Human handoff — when the agent gets stuck (CAPTCHA, 2FA, ambiguous choice), it pings you via Telegram. CDP screencast streams the viewport to a remote viewer, and the viewer sends input events back. You unblock it, the agent continues.
- Skills that compound — every task can become a reusable Playwright script. Your library grows over time.
When UseBrowser shines
Section titled “When UseBrowser shines”UseBrowser is not the right tool for everything. Here’s where it earns its keep — and where you might not need it.
Built for this
Section titled “Built for this”- Multi-site research — “Compare pricing across Amazon, Lazada, and Shopee” involves navigating three sites, handling different layouts, extracting structured data, and compiling results. This is where an agent with a real browser pulls ahead. See Prompting for how to describe these tasks.
- Tasks behind authentication — anything involving your logged-in accounts (Gmail, Shopee, GitHub, banking dashboards). Your agent inherits your real browser sessions.
- Long-running autonomous work — research that takes 20-30 minutes across dozens of pages. The agent runs, pings you if it hits a wall, and continues. You don’t babysit.
- Extension-dependent workflows — crypto (MetaMask), password managers, ad blockers. If the task requires a Chrome extension, you need a real browser with real extensions loaded.
- Repetitive workflows worth capturing — tasks you do weekly that take 10+ minutes. Record once, run as a one-liner forever. The value compounds.
- Developer testing loops — write code in the terminal, test in the real browser, iterate. Same browser, same session, no context switching. See MCP Overview for the developer workflow.
Probably overkill for
Section titled “Probably overkill for”- A single API call — if the data is available via API, just call the API.
- One-off scraping of a public page — Playwright or
curlis simpler and faster. - Tasks that don’t involve a browser — Claude Code in a normal terminal handles file operations, coding, and CLI tasks fine on its own.
The pattern: if the task involves browsing — navigating, interacting, reading pages that require JavaScript rendering, dealing with auth — UseBrowser is where the value is. The more complex and repetitive the task, the bigger the payoff.
Why not just…
Section titled “Why not just…”Why not a Chrome extension?
Section titled “Why not a Chrome extension?”Extensions run in a sandboxed content script environment. They can modify pages, but they can’t:
- Access Chrome DevTools Protocol
- Control other extensions (e.g. MetaMask)
- Simulate native input (OS-level cursor, keyboard)
- Run a terminal or shell process
- Intercept network traffic at the protocol level
- Modify browser chrome (tab bar, URL bar, navigation)
An extension is a guest in someone else’s house. A fork means you own the house.
Why not headless?
Section titled “Why not headless?”Headless browsers have no window, no GPU rendering, and limited extension support. Anti-bot systems detect them more easily (missing browser fingerprints, navigator.webdriver flag). There’s no way to do human handoff — there’s nothing to show. And the agent works blind — you can’t watch it or redirect it in real time.
That said — UseBrowser can run headlessly on a cloud instance for stateless scraping at scale. Public pages, no login required, high throughput. Headless is the right mode for that.
But for stateful work — anything involving your accounts, your cookies, your extensions, your identity — you want a headful browser. That’s where you log into Gmail, interact with MetaMask, and build up sessions that persist across days. That’s UseBrowser’s home turf.
Why not a cloud browser?
Section titled “Why not a cloud browser?”Cloud browsers add latency on every interaction and don’t carry your local identity. But they have a place: stateless scraping at scale, CI/CD test pipelines, parallel execution across regions.
UseBrowser runs on your machine by default — native speed, your extensions, your data stays local. For the scraping-at-scale use case, you can deploy UseBrowser on a cloud VM too. Install it on an instance, connect via MCP over SSH tunnel, and you get the full feature set remotely.
The difference from hosted services: you control the instance. Your data doesn’t pass through someone else’s infrastructure.
Why not Playwright/Puppeteer directly?
Section titled “Why not Playwright/Puppeteer directly?”They’re great automation libraries. We use Playwright ourselves — every skill UseBrowser generates is a Playwright script. But on their own, they’re tools for you to write automation. There’s no agent loop, no MCP, no skill compounding, no human handoff. You script everything manually.
UseBrowser puts Playwright in the hands of the AI agent, with all the infrastructure to make that actually work.
Why a fork
Section titled “Why a fork”We didn’t want to build a wrapper. Wrappers are limited by what the browser chooses to expose. We needed to:
- Expose internal APIs to agents beyond what CDP offers by default
- Run and manage an MCP server as part of the browser lifecycle
- Add native CAPTCHA solving at the engine level
- Build a terminal directly into the browser chrome
- Control tab isolation, extension access, and page capture from source
- Support both CDP and native OS input simulation
You can’t do that with an extension or a Playwright script. You need source-level access to Chromium. So that’s what we did.
What’s next
Section titled “What’s next”- Installation — Download and set up UseBrowser
- Prompting — How to describe tasks so Claude gets them right
- Controls — Command Palette, Agent Mode, and AI Drawer