AI Tools
中文

Browser Harness Review: Letting an LLM Drive Your Browser for Real

browser-use/browser-harness is a 10k-star open-source project that connects LLMs directly to your real browser via CDP. I spent two days with it — here's what actually works.

browser-automationllmcdpai-agentpython

广告

Browser Harness Review: Letting an LLM Drive Your Browser for Real

I caught wind of a new project from the browser-use team called browser-harness. It’s sitting at around 10k stars and makes a bold claim: “Self-healing harness that enables LLMs to complete any task.” In plain English — let a large language model directly control your browser, and it can fix itself when things break.

I’m already pretty familiar with the main browser-use project, which is probably the most mature browser-automation AI agent out there right now. This harness takes a completely different approach. Instead of wrapping everything in pre-defined actions, it hands the LLM a direct WebSocket connection to Chrome via the DevTools Protocol and essentially says: “Figure it out.”

How it actually works

The core idea is surprisingly radical. Traditional automation stacks give agents a fixed playbook of actions to execute. The harness flips that — it provides an ultra-thin connection layer (about 1,000 lines across 4 core files) and lets the LLM talk directly to the browser over CDP. When the agent notices a missing helper for a task, it writes one on the spot.

Here’s a concrete example: the agent wants to upload a file but agent_helpers.py doesn’t have a helper for that. So it writes one, adds it to the file, and proceeds. Next time the same action comes up, it just reuses the helper. That’s the “self-healing” part — every run accumulates domain knowledge.

The architecture breaks down like this:

  • install.md — first-time setup and browser bootstrap
  • SKILL.md — day-to-day usage patterns
  • src/browser_harness/ — protected core package the agent can’t touch
  • agent-workspace/agent_helpers.py — helper code the agent edits freely
  • agent-workspace/domain-skills/ — reusable site-specific skills

My real-world experience

Setup is genuinely simple. The README gives you a setup prompt you can paste straight into Claude Code. It opens chrome://inspect/#remote-debugging for you to tick the remote debugging checkbox, then you click Allow on the permission popup.

I tested two tasks:

Auto-filling a form — I asked the agent to send a message on LinkedIn. The first run got stuck at the file upload step, but about 30 seconds later it wrote its own helper and the second attempt went through smoothly. Honestly, the self-healing process impressed me more than the end result.

Batch data extraction — I needed to scrape order lists from an old dashboard with no API. The agent figured out the login flow, pagination logic, and even wrote a domain-skills/legacy-dashboard/ skill to persist that knowledge. Next time I ran the same site, it basically worked with zero extra configuration.

Quick start

git clone https://github.com/browser-use/browser-harness.git
cd browser-harness
# Follow install.md or paste the setup prompt into Claude Code

They also integrate with Browser Use Cloud, which gives you 3 concurrent browsers on the free tier, plus proxies and CAPTCHA solving. Worth trying if your local network is unreliable.

Pros and cons, straight up

What I liked:

  • Extremely lightweight at ~1k core lines — you can actually read and modify the source
  • Self-healing isn’t just marketing; the agent genuinely writes and reuses helpers
  • Controls a real browser, not a headless simulation, so compatibility is excellent
  • Domain skills accumulate over time — the same site gets faster with each run

What frustrated me:

  • You need Chrome running with remote debugging enabled, which is a security consideration
  • First runs on new sites are slow because the agent has to “explore” first
  • Heavily geared toward Claude Code or similar AI coding tools — manual setup is more involved
  • Documentation is still lean; edge cases often require reading the source code

Browser-use vs. browser-harness: which one?

If you want something that works out of the box without thinking about internals, stick with the main browser-use project. If you want maximum freedom for the agent, need it to evolve its own workflows, or you’re dealing with complex browser tasks that pre-defined actions can’t cover, harness is the better fit. Think of it as automatic vs. manual transmission — harness gives you full control over the clutch.

Bottom line

Browser Harness is one of those projects that feels underwhelming at first glance but grows on you the more you use it. The self-healing design is genuinely forward-thinking. I’m planning to migrate a few recurring data-scraping tasks over to it and see how much efficiency improves after a few weeks of accumulated domain skills. I’ll report back when I have numbers.


About the Author

Liudingyu is a full-stack developer and heavy GitHub user. With 900+ starred repos over the past 3 years, this site only covers tools I’ve actually used or deeply researched.

📧 Found a great tool to recommend? Email [email protected]

广告

Related Posts