Open source computer use agent for devs, the part nobody writes up
Most articles on this topic show a single agent in a fresh sandbox completing one task, then exiting. That is not how a real production fleet looks. This is what mine looks like: three Playwright MCP browser agents, each pinned to its own persistent Chrome profile, driven from cron, never sharing a cookie. Six JSON files do all of the work.
The anchor fact
On my disk there are exactly six files that define the fleet. Three Playwright configs and three MCP wrapper configs. They live under ~/.claude/browser-agent-configs/. Here is the smaller of the two reddit files, verbatim:
{
"browser": {
"userDataDir": "__HOME__/.claude/browser-profiles/reddit",
"launchOptions": {
"args": [
"--window-position=150,150",
"--window-size=911,1016"
]
},
"contextOptions": {
"viewport": { "width": 911, "height": 1016 }
}
},
"outputMode": "file",
"imageResponses": "omit",
"outputDir": "__HOME__/.playwright-mcp/reddit-agent"
}Sixteen lines. The only field that matters for the multi-agent story is browser.userDataDir. That is a directory Playwright hands to Chromium as the persistent on-disk profile. Cookies, IndexedDB rows, service workers, the CDP cache, all of it lives there. Every subsequent run resumes from that state instead of booting a blank Chromium. That is the difference between a demo and a fleet.
The matching wrapper file reddit-agent-mcp.json turns that config into an MCP server any LLM can call:
{
"mcpServers": {
"reddit-agent": {
"type": "stdio",
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--config",
"__HOME__/.claude/browser-agent-configs/reddit-agent.json"
],
"env": {
"PATH": "__NODE_BIN__:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
}
}
}
}That is the entire integration surface. command: npx, args: [@playwright/mcp@latest, --config, ...]. No package to vendor, no Dockerfile, no patched Chromium. The twitter and linkedin pairs are byte-identical except for the path and window position.
You can list the whole fleet on disk in one shell:
$ ls ~/.claude/browser-profiles linkedin reddit twitter $ ls ~/.claude/browser-agent-configs linkedin-agent.json reddit-agent.json twitter-agent.json linkedin-agent-mcp.json reddit-agent-mcp.json twitter-agent-mcp.json
Why every other open source agent guide stops here
Pull up any of the popular open source computer use repos and run their default example. You will get a screenshot of an agent completing one task, beautifully. Then you will close the process, and everything the agent learned about being signed into a real account, dismissing real banners, and clicking real cookie modals evaporates with it. The next run starts blank.
That is fine for a benchmark. It is useless for a developer who actually wants to leave an agent on their machine for a week and come back to find that it has done something useful while staying logged into three different sites without any of them tripping a security check.
One demo run vs. a real fleet
A single agent that boots a fresh Chromium, completes one demo (book a flight, fill a form, browse a site), then closes. The screenshot looks impressive. The session is gone the moment the process exits.
- Fresh Chromium every run, no logins survive
- Single agent, single platform, single task
- Auth is hand-waved or done with brittle automation
- Re-running the demo means signing in again
Build the fleet, step by step
The whole pattern is five moves. None of them require writing Playwright code yourself. None of them require a custom MCP server. Everything lives in JSON.
Pick the wrapper, not the model
The expensive part of a computer use agent is not the LLM. It is the live browser, the cookies, and the keep-alive. @playwright/mcp@latest handles all three and ships as a stdio MCP server, so any LLM that speaks MCP can drive it.
One JSON config per agent
Each agent gets a config like reddit-agent.json. The only field that matters for parallelism is browser.userDataDir. Point it at a directory that nothing else writes to, and that agent's cookies, IndexedDB, and CDP cache live there forever.
Wrap each config as an MCP server
A second tiny file, reddit-agent-mcp.json, declares the MCP server: command npx, args @playwright/mcp@latest plus a path to the agent config. This is the file your MCP client (Claude, Cursor, your own runner) registers.
Log in once, manually
Open each profile by hand the first time, sign in, solve any 2FA. The userDataDir captures the session. From then on every cron run reuses it. No headless OAuth flow to babysit, no automated login to maintain.
Run them in parallel from cron
Three platforms times one userDataDir each times one MCP server each equals three independent Chromiums that can run at the same minute. They never share cookies, so none of them can log each other out, throw a session conflict, or mix up which account posted what.
Register the agents with your LLM
The MCP wrapper file is what your driver registers. With Claude Code, the call is one line per agent. The same JSON works as input to any other MCP client (Cursor, Continue, a Python or TypeScript runner using the official MCP SDK).
claude mcp add-json reddit-agent \ --transport stdio \ "$(cat ~/.claude/browser-agent-configs/reddit-agent-mcp.json | jq '.mcpServers."reddit-agent"')" # Repeat for twitter-agent and linkedin-agent. # Now Claude (or any MCP client) sees three browser agents, # each pinned to its own Chrome profile.
After registration, the LLM sees three browser agents as three distinct MCP servers, each with its own toolset (browser_navigate, browser_click, browser_type, browser_snapshot, browser_evaluate, and so on). A tool call to reddit-agent.browser_navigate is unambiguously routed to the Chromium that has the reddit cookies. A tool call to linkedin-agent.browser_navigate is unambiguously routed to a different Chromium.
What actually happens on one run
Walk through a single reddit post. Cron fires the runner script. The runner asks the LLM for the next action. The LLM calls reddit-agent. The MCP server spawns Chromium with the persistent profile. Chromium loads reddit.com with cookies that have been sitting on disk for weeks. The page comes back already logged in. The LLM clicks, types, posts. The Chromium closes, the cookies stay on disk for the next run.
One run through reddit-agent
What each agent gets that single-agent setups miss
The shape of the fleet is six small things, all of them mundane, all of them load-bearing. Each item below is a property a single one-shot computer use demo does not have, and the reason a demo becomes a real production fleet the moment you add it.
Persistent profile per agent
userDataDir = ~/.claude/browser-profiles/reddit (or twitter, or linkedin). Cookies, IndexedDB, service workers, and CDP cache live in that one folder. A run does not start with a blank slate; it starts where the previous run left off.
MCP as the wire format
Each agent is just an MCP stdio server. Your driver speaks MCP, not Playwright API. Swap the LLM, swap the runner, swap the orchestrator without touching the browser layer.
Zero shared state
Three userDataDirs, three outputDirs, three MCP server names. No accidental cross-platform login bleed, no race on a single Chrome user-data folder.
One npx, no install
command: npx, args: [@playwright/mcp@latest, --config, agent.json]. No package to vendor, no Dockerfile to maintain. The wrapper version pin lives in your config repo.
Window position per agent
launchOptions.args sets a different --window-position per agent (150,150 / 100,100 / 200,200). When you watch the fleet run live you can tell at a glance which Chromium is which.
First-run login is human
You open Chrome against the userDataDir once, sign in, accept any captcha, solve 2FA. After that, the agent inherits the session. No headless OAuth dance to keep alive.
The minimal stack
There is nothing under the hood that is not on this list. No custom Chromium build, no patched Playwright, no proprietary orchestrator. Each piece is a thing a dev already has installed.
Playwright MCP
@playwright/mcp@latest from npm. The stdio server that exposes a Chrome instance as MCP tools (snapshot, click, type, evaluate, navigate).
Claude Code
Registers each agent JSON as an MCP server. The LLM drives the fleet by calling tools on whichever agent name matches the platform.
Cron / launchd
Fires runs on a schedule. Each scheduled job sources a runner script that talks to one MCP agent at a time.
Chromium
Stock Chromium under Playwright. No patched build, no headful flag tricks. Each profile is a normal Chrome on-disk userDataDir.
MCP clients
Anything that speaks MCP works as a driver: Claude Code, Cursor, Continue, your own Python or TS runner using the MCP SDK.
jq + bash
Registering the agents with the LLM is a one-line jq + claude mcp add-json. Nothing fancier is needed.
The isolation rules, in one place
If your fleet ever does something weird (one platform forgets it is logged in, two agents post the same content, a Chromium hangs on launch), one of these rules has been broken. They are short and load-bearing.
What to verify on every agent
- @playwright/mcp@latest is the only runtime dependency. No custom fork, no patched Chromium build.
- Every agent has its own userDataDir under ~/.claude/browser-profiles/<platform>. Nothing else on disk writes there.
- Every agent has its own outputDir under ~/.playwright-mcp/<agent>-agent so snapshots from one agent never leak into another's transcript.
- Each agent is registered as an MCP server with a unique name, so the LLM can route a tool call to the right Chromium without ambiguity.
- First-run logins are interactive and human, not automated. The session lives in userDataDir and is reused on every subsequent run.
- When two pipelines need the same browser profile in the same minute, only one Chromium is allowed to attach to that userDataDir at a time. Mutex this in your runner, not in MCP.
What this is not
This pattern is for devs who want to leave a few agents running against sites they own or operate, in their own name. It is not a way to run a thousand fake accounts on someone else's platform. The whole point of persistent userDataDirs is that each one is a real, manually-signed-in human session. That puts a hard cap on scale and a lower bound on responsibility, which I think is the right shape for a personal agent fleet anyway.
It is also not a replacement for a desktop computer use agent if your task lives outside the browser. If you need to drive Figma, Final Cut, or some macOS native app, you want pixel-and-keystroke tools (Anthropic computer use, OpenAdapt, macos-use) rather than DOM tools. The Playwright MCP path is for sites that render in a web browser, which is most of them.
The smallest version you can ship today
Pick one site you want an agent on. Make a singlemysite-agent.jsonwith a userDataDir under your home directory. Make a singlemysite-agent-mcp.jsonthat wraps it. Open Chrome against that userDataDir once and log in. Register the MCP server with your LLM. Have it perform a single, low-stakes action. The whole thing is two files and one npx command. Adding a second agent is the same two files with a different name and a different userDataDir. Nothing else changes.
That is the entire pattern. If you want to see it run end to end on three platforms at once, with the lock layer that prevents two cron jobs from grabbing the same Chrome profile in the same minute, the next deepest piece is the orchestrator writeup linked below.
Want this fleet wired into your own product?
If you want a working multi-agent computer use setup running against your own targets in a week, book a call and we will scope it.
Going deeper on the same stack
Social media agent orchestrator: real architecture
37 launchd jobs, 5 browser profiles, and the two-tier lock that prevents two Claude agents from colliding on the same Chrome user-data folder.
Open source social autoposter
What ships in the npm package, what is in config.json, and how the skill turns into a fleet of real browser agents that actually post.
Claude skills for social-autoposter
How the skill folder maps to launchd jobs, MCP browser agents, and the helper Python scripts that wire the data layer to the LLM.
Frequently asked questions
Why three configs instead of one shared browser?
Cookies. The instant two platforms share the same Chrome userDataDir, the same Google or Cloudflare session shows up across both. That triggers anti-abuse heuristics on every site that does cross-domain fingerprinting. Three separate userDataDirs is the cheapest way to keep three real human sessions believably independent. It is also the only way you can run all three in the same minute, because Chromium will not let two processes attach to the same user-data folder at once.
Why @playwright/mcp@latest rather than computer use API or a custom wrapper?
@playwright/mcp@latest is open source, ships from npm, and exposes the live page as MCP tools (snapshot, click, type, evaluate, navigate). That gives you DOM-precise actions without paying per token to look at pixels. The Anthropic computer use API is great for general desktop control, but for a dev who already knows what site they are driving, DOM accessor + selector reference is faster, cheaper, and more deterministic.
How does the LLM know which agent to call?
Each agent is registered with a unique MCP server name (reddit-agent, twitter-agent, linkedin-agent). Tool calls are namespaced per server. When the LLM wants to post on Reddit, it calls reddit-agent.browser_navigate. When it wants to post on LinkedIn, it calls linkedin-agent.browser_navigate. Same Playwright tools, three independent browsers.
What happens to logins after a few weeks?
They expire on the same schedule a normal browser would. Reddit sessions tend to last weeks, X sessions tend to drop sooner if the account triggers a security check, LinkedIn behaves like a normal LinkedIn session in Chrome. The mitigation is the same as for a human user: open Chrome against the userDataDir, sign in again, close. The fleet inherits the new session on the next run.
Can I run more than three platforms with the same pattern?
Yes, the pattern is N JSON files for N platforms. The only ceiling is your machine. Each persistent Chromium uses around 200-400 MB resident, and Playwright spawns one process tree per agent. Five or six platforms on a developer Mac is fine. If you go past that, run them on separate cron ticks rather than in parallel.
Does this work for sites with serious anti-automation?
Better than headless. Persistent userDataDirs keep a long-lived browser fingerprint, real cookies, real local storage, real history, all the things sites use to decide a client is normal. It still loses against bespoke detection if you behave non-humanly (random clicks, no mouse movement, super-human typing). Throttle actions, jitter timings, and treat the agent like a slow polite human and you are fine for most platforms.
Where do the configs live in this stack?
On disk under ~/.claude/browser-agent-configs/. There are six files for three platforms: reddit-agent.json, twitter-agent.json, linkedin-agent.json (the Playwright configs), plus reddit-agent-mcp.json, twitter-agent-mcp.json, linkedin-agent-mcp.json (the MCP wrappers). Both halves of each pair are required: the first tells Playwright where to put Chrome, the second tells your LLM how to talk to that Chrome.
How is this different from just using Playwright in a Python script?
Two things. First, MCP gives you a uniform tool surface that any LLM can call, instead of a Playwright API that only your runner can call. Second, the userDataDir convention turns a one-shot script into a living agent. A normal Playwright script boots a fresh context, runs, exits, and the next run starts blank. This pattern keeps the same Chromium profile alive across hundreds of runs.