Introduction

The browser was the bottleneck

Claude Code gave us the magic to collapse days of work into hours — in the terminal. But the moment you open a browser, the magic stops. You’re back to doing things by hand, clicking through pages, pretending this is fine.

The AI was never the problem. The browser was. Existing browsers weren’t built for agents to control. So we forked Chromium from source and built one that is.

Compared to automation tools

The ecosystem has good tools. Each solves part of the problem.

Tool	What it does	Where it stops
Playwright	Best-in-class browser automation. Headless or headed, fast, reliable. Can persist storage state between runs with manual setup.	No AI agent loop. No MCP. Extension support requires headed Chromium and manual configuration. You write every script yourself.
Puppeteer	Chrome DevTools Protocol wrapper. Lightweight, scriptable.	Narrower API than Playwright. No agent integration. Same manual-scripting model.
Browseruse	Connects LLMs to a browser via Playwright. AI decides what to click. Can run headed or headless.	No persistent sessions by default. No extension support. No built-in human handoff. Limited real-time visibility.
Stagehand	AI-powered browser automation framework by Browserbase. Smart element targeting. Can use local Playwright or Browserbase cloud.	No MCP. No persistent local identity. No extension interaction. No human handoff.
Selenium	The original. WebDriver-based. Massive ecosystem.	Brittle selectors. Detectable by anti-bot (`navigator.webdriver`). No AI integration.

These tools share a fundamental architecture: the AI agent sits outside the browser and pokes it through an automation API. The browser is a black box that receives synthetic events.

That works for simple page reads and form fills. It doesn’t work when you need real sessions, real extensions, native input control, or long-running tasks with human collaboration.

Compared to AI browsers

There’s a newer category: browsers that add AI features. Atlas, Dia, Chromate — they build AI into a browser designed for humans.

UseBrowser is different. It’s not a browser with an agent. It’s a browser for an agent.

	AI browsers (Atlas, Dia, Chromate)	UseBrowser
Primary user	You. AI assists your browsing.	The AI agent. You assist when needed.
Architecture	Standard browser + AI sidebar/copilot	Chromium fork with agent control at the engine level
Agent control	Limited — AI suggests, you execute	Full — agent controls cursor, keyboard, navigation, extensions
MCP / external agents	No. Their AI, their interface.	Yes. Any MCP-compatible agent can connect.
Programmability	Use the features they ship	Skills, CLAUDE.md, Playwright scripts — you shape how the agent works
Human handoff	Not applicable — you’re already there	Built in. Agent pings you via Telegram when stuck.
Terminal	No	Claude Code built in, pre-configured

AI browsers add intelligence to your browsing experience. UseBrowser gives the AI agent a browser it can actually operate. You watch, you guide, you step in when needed — but the agent drives.

What we built

UseBrowser is a Chromium fork where agent control is built into the engine — not bolted on, not wrapped around, not injected via extension.

Native control

This is the core difference. UseBrowser gives agents access to the browser’s internals that wrappers and extensions can’t reach:

Input simulation at multiple levels — agents can control the cursor and keyboard through CDP Input.dispatchMouseEvent for in-page interactions, or through native OS events (CoreGraphics on macOS) for system-level input. This is what makes CAPTCHA solving work — a Bezier-curve mouse drag with realistic noise and velocity that closely mimics human hand movement.
Full CDP access — 22 curated MCP tools cover navigation, interaction, extraction, recording, and human handoff. For anything beyond those, execute_playwright gives agents programmatic access to the full Playwright API with page, browser, and context pre-bound.
Extension access — agents interact with extensions (MetaMask, password managers) the same way you do — through their actual UI. Not mocked, not stubbed.

Architecture

The browser launches a Python MCP server as a child process. The server bridges MCP (SSE on port 9225) to the browser’s CDP interface (port 9222):

AI Agent ──MCP/SSE──→ localhost:9225 ──CDP──→ UseBrowser (Chromium fork)

The MCP server is managed by the browser — launched on startup, killed on shutdown. Claude Code in the built-in terminal is pre-configured to connect. Any external MCP-compatible agent can connect to the same endpoint.

What that means in practice

Persistent sessions — real cookies, real logins, real browsing history. Your agent picks up where you left off tomorrow.
Built-in terminal — Claude Code runs inside the browser with pre-configured MCP access. No setup, no wiring.
Real-time visibility — you watch the agent work in a real browser window. Not a headless void.
Human handoff — when the agent gets stuck (CAPTCHA, 2FA, ambiguous choice), it pings you via Telegram. CDP screencast streams the viewport to a remote viewer, and the viewer sends input events back. You unblock it, the agent continues.
Skills that compound — every task can become a reusable Playwright script. Your library grows over time.

When UseBrowser shines

UseBrowser is not the right tool for everything. Here’s where it earns its keep — and where you might not need it.

Built for this

Multi-site research — “Compare pricing across Amazon, Lazada, and Shopee” involves navigating three sites, handling different layouts, extracting structured data, and compiling results. This is where an agent with a real browser pulls ahead. See Prompting for how to describe these tasks.
Tasks behind authentication — anything involving your logged-in accounts (Gmail, Shopee, GitHub, banking dashboards). Your agent inherits your real browser sessions.
Long-running autonomous work — research that takes 20-30 minutes across dozens of pages. The agent runs, pings you if it hits a wall, and continues. You don’t babysit.
Extension-dependent workflows — crypto (MetaMask), password managers, ad blockers. If the task requires a Chrome extension, you need a real browser with real extensions loaded.
Repetitive workflows worth capturing — tasks you do weekly that take 10+ minutes. Record once, run as a one-liner forever. The value compounds.
Developer testing loops — write code in the terminal, test in the real browser, iterate. Same browser, same session, no context switching. See MCP Overview for the developer workflow.

Probably overkill for

A single API call — if the data is available via API, just call the API.
One-off scraping of a public page — Playwright or curl is simpler and faster.
Tasks that don’t involve a browser — Claude Code in a normal terminal handles file operations, coding, and CLI tasks fine on its own.

The pattern: if the task involves browsing — navigating, interacting, reading pages that require JavaScript rendering, dealing with auth — UseBrowser is where the value is. The more complex and repetitive the task, the bigger the payoff.

Why not just…

Why not a Chrome extension?

Extensions run in a sandboxed content script environment. They can modify pages, but they can’t:

Access Chrome DevTools Protocol
Control other extensions (e.g. MetaMask)
Simulate native input (OS-level cursor, keyboard)
Run a terminal or shell process
Intercept network traffic at the protocol level
Modify browser chrome (tab bar, URL bar, navigation)

An extension is a guest in someone else’s house. A fork means you own the house.

Why not headless?

Headless browsers have no window, no GPU rendering, and limited extension support. Anti-bot systems detect them more easily (missing browser fingerprints, navigator.webdriver flag). There’s no way to do human handoff — there’s nothing to show. And the agent works blind — you can’t watch it or redirect it in real time.

That said — UseBrowser can run headlessly on a cloud instance for stateless scraping at scale. Public pages, no login required, high throughput. Headless is the right mode for that.

But for stateful work — anything involving your accounts, your cookies, your extensions, your identity — you want a headful browser. That’s where you log into Gmail, interact with MetaMask, and build up sessions that persist across days. That’s UseBrowser’s home turf.

Why not a cloud browser?

Cloud browsers add latency on every interaction and don’t carry your local identity. But they have a place: stateless scraping at scale, CI/CD test pipelines, parallel execution across regions.

UseBrowser runs on your machine by default — native speed, your extensions, your data stays local. For the scraping-at-scale use case, you can deploy UseBrowser on a cloud VM too. Install it on an instance, connect via MCP over SSH tunnel, and you get the full feature set remotely.

The difference from hosted services: you control the instance. Your data doesn’t pass through someone else’s infrastructure.

Why not Playwright/Puppeteer directly?

They’re great automation libraries. We use Playwright ourselves — every skill UseBrowser generates is a Playwright script. But on their own, they’re tools for you to write automation. There’s no agent loop, no MCP, no skill compounding, no human handoff. You script everything manually.

UseBrowser puts Playwright in the hands of the AI agent, with all the infrastructure to make that actually work.

Why a fork

We didn’t want to build a wrapper. Wrappers are limited by what the browser chooses to expose. We needed to:

Expose internal APIs to agents beyond what CDP offers by default
Run and manage an MCP server as part of the browser lifecycle
Add native CAPTCHA solving at the engine level
Build a terminal directly into the browser chrome
Control tab isolation, extension access, and page capture from source
Support both CDP and native OS input simulation

You can’t do that with an extension or a Playwright script. You need source-level access to Chromium. So that’s what we did.

What’s next

Installation — Download and set up UseBrowser
Prompting — How to describe tasks so Claude gets them right
Controls — Command Palette, Agent Mode, and AI Drawer