Skip to content

Introduction

Claude Code gave us the magic to collapse days of work into hours — in the terminal. But the moment you open a browser, the magic stops. You’re back to doing things by hand, clicking through pages, pretending this is fine.

The AI was never the problem. The browser was. Existing browsers weren’t built for agents to control. So we forked Chromium from source and built one that is.

The ecosystem has good tools. Each solves part of the problem.

ToolWhat it doesWhere it stops
PlaywrightBest-in-class browser automation. Headless or headed, fast, reliable. Can persist storage state between runs with manual setup.No AI agent loop. No MCP. Extension support requires headed Chromium and manual configuration. You write every script yourself.
PuppeteerChrome DevTools Protocol wrapper. Lightweight, scriptable.Narrower API than Playwright. No agent integration. Same manual-scripting model.
BrowseruseConnects LLMs to a browser via Playwright. AI decides what to click. Can run headed or headless.No persistent sessions by default. No extension support. No built-in human handoff. Limited real-time visibility.
StagehandAI-powered browser automation framework by Browserbase. Smart element targeting. Can use local Playwright or Browserbase cloud.No MCP. No persistent local identity. No extension interaction. No human handoff.
SeleniumThe original. WebDriver-based. Massive ecosystem.Brittle selectors. Detectable by anti-bot (navigator.webdriver). No AI integration.

These tools share a fundamental architecture: the AI agent sits outside the browser and pokes it through an automation API. The browser is a black box that receives synthetic events.

That works for simple page reads and form fills. It doesn’t work when you need real sessions, real extensions, native input control, or long-running tasks with human collaboration.

There’s a newer category: browsers that add AI features. Atlas, Dia, Chromate — they build AI into a browser designed for humans.

UseBrowser is different. It’s not a browser with an agent. It’s a browser for an agent.

AI browsers (Atlas, Dia, Chromate)UseBrowser
Primary userYou. AI assists your browsing.The AI agent. You assist when needed.
ArchitectureStandard browser + AI sidebar/copilotChromium fork with agent control at the engine level
Agent controlLimited — AI suggests, you executeFull — agent controls cursor, keyboard, navigation, extensions
MCP / external agentsNo. Their AI, their interface.Yes. Any MCP-compatible agent can connect.
ProgrammabilityUse the features they shipSkills, CLAUDE.md, Playwright scripts — you shape how the agent works
Human handoffNot applicable — you’re already thereBuilt in. Agent pings you via Telegram when stuck.
TerminalNoClaude Code built in, pre-configured

AI browsers add intelligence to your browsing experience. UseBrowser gives the AI agent a browser it can actually operate. You watch, you guide, you step in when needed — but the agent drives.

UseBrowser is a Chromium fork where agent control is built into the engine — not bolted on, not wrapped around, not injected via extension.

This is the core difference. UseBrowser gives agents access to the browser’s internals that wrappers and extensions can’t reach:

  • Input simulation at multiple levels — agents can control the cursor and keyboard through CDP Input.dispatchMouseEvent for in-page interactions, or through native OS events (CoreGraphics on macOS) for system-level input. This is what makes CAPTCHA solving work — a Bezier-curve mouse drag with realistic noise and velocity that closely mimics human hand movement.
  • Full CDP access — 22 curated MCP tools cover navigation, interaction, extraction, recording, and human handoff. For anything beyond those, execute_playwright gives agents programmatic access to the full Playwright API with page, browser, and context pre-bound.
  • Extension access — agents interact with extensions (MetaMask, password managers) the same way you do — through their actual UI. Not mocked, not stubbed.

The browser launches a Python MCP server as a child process. The server bridges MCP (SSE on port 9225) to the browser’s CDP interface (port 9222):

AI Agent ──MCP/SSE──→ localhost:9225 ──CDP──→ UseBrowser (Chromium fork)

The MCP server is managed by the browser — launched on startup, killed on shutdown. Claude Code in the built-in terminal is pre-configured to connect. Any external MCP-compatible agent can connect to the same endpoint.

  • Persistent sessions — real cookies, real logins, real browsing history. Your agent picks up where you left off tomorrow.
  • Built-in terminal — Claude Code runs inside the browser with pre-configured MCP access. No setup, no wiring.
  • Real-time visibility — you watch the agent work in a real browser window. Not a headless void.
  • Human handoff — when the agent gets stuck (CAPTCHA, 2FA, ambiguous choice), it pings you via Telegram. CDP screencast streams the viewport to a remote viewer, and the viewer sends input events back. You unblock it, the agent continues.
  • Skills that compound — every task can become a reusable Playwright script. Your library grows over time.

UseBrowser is not the right tool for everything. Here’s where it earns its keep — and where you might not need it.

  • Multi-site research — “Compare pricing across Amazon, Lazada, and Shopee” involves navigating three sites, handling different layouts, extracting structured data, and compiling results. This is where an agent with a real browser pulls ahead. See Prompting for how to describe these tasks.
  • Tasks behind authentication — anything involving your logged-in accounts (Gmail, Shopee, GitHub, banking dashboards). Your agent inherits your real browser sessions.
  • Long-running autonomous work — research that takes 20-30 minutes across dozens of pages. The agent runs, pings you if it hits a wall, and continues. You don’t babysit.
  • Extension-dependent workflows — crypto (MetaMask), password managers, ad blockers. If the task requires a Chrome extension, you need a real browser with real extensions loaded.
  • Repetitive workflows worth capturing — tasks you do weekly that take 10+ minutes. Record once, run as a one-liner forever. The value compounds.
  • Developer testing loops — write code in the terminal, test in the real browser, iterate. Same browser, same session, no context switching. See MCP Overview for the developer workflow.
  • A single API call — if the data is available via API, just call the API.
  • One-off scraping of a public page — Playwright or curl is simpler and faster.
  • Tasks that don’t involve a browser — Claude Code in a normal terminal handles file operations, coding, and CLI tasks fine on its own.

The pattern: if the task involves browsing — navigating, interacting, reading pages that require JavaScript rendering, dealing with auth — UseBrowser is where the value is. The more complex and repetitive the task, the bigger the payoff.

Extensions run in a sandboxed content script environment. They can modify pages, but they can’t:

  • Access Chrome DevTools Protocol
  • Control other extensions (e.g. MetaMask)
  • Simulate native input (OS-level cursor, keyboard)
  • Run a terminal or shell process
  • Intercept network traffic at the protocol level
  • Modify browser chrome (tab bar, URL bar, navigation)

An extension is a guest in someone else’s house. A fork means you own the house.

Headless browsers have no window, no GPU rendering, and limited extension support. Anti-bot systems detect them more easily (missing browser fingerprints, navigator.webdriver flag). There’s no way to do human handoff — there’s nothing to show. And the agent works blind — you can’t watch it or redirect it in real time.

That said — UseBrowser can run headlessly on a cloud instance for stateless scraping at scale. Public pages, no login required, high throughput. Headless is the right mode for that.

But for stateful work — anything involving your accounts, your cookies, your extensions, your identity — you want a headful browser. That’s where you log into Gmail, interact with MetaMask, and build up sessions that persist across days. That’s UseBrowser’s home turf.

Cloud browsers add latency on every interaction and don’t carry your local identity. But they have a place: stateless scraping at scale, CI/CD test pipelines, parallel execution across regions.

UseBrowser runs on your machine by default — native speed, your extensions, your data stays local. For the scraping-at-scale use case, you can deploy UseBrowser on a cloud VM too. Install it on an instance, connect via MCP over SSH tunnel, and you get the full feature set remotely.

The difference from hosted services: you control the instance. Your data doesn’t pass through someone else’s infrastructure.

They’re great automation libraries. We use Playwright ourselves — every skill UseBrowser generates is a Playwright script. But on their own, they’re tools for you to write automation. There’s no agent loop, no MCP, no skill compounding, no human handoff. You script everything manually.

UseBrowser puts Playwright in the hands of the AI agent, with all the infrastructure to make that actually work.

We didn’t want to build a wrapper. Wrappers are limited by what the browser chooses to expose. We needed to:

  • Expose internal APIs to agents beyond what CDP offers by default
  • Run and manage an MCP server as part of the browser lifecycle
  • Add native CAPTCHA solving at the engine level
  • Build a terminal directly into the browser chrome
  • Control tab isolation, extension access, and page capture from source
  • Support both CDP and native OS input simulation

You can’t do that with an extension or a Playwright script. You need source-level access to Chromium. So that’s what we did.

  • Installation — Download and set up UseBrowser
  • Prompting — How to describe tasks so Claude gets them right
  • Controls — Command Palette, Agent Mode, and AI Drawer