Guide, orchestration internals

A social media agent orchestrator is really a lock manager with good taste in what runs when.

Every article about "social media agent orchestrators" talks about one meta-agent delegating to worker agents. That framing is fine until you run it on a real machine. The moment two Claude sessions launch Chrome against the same ~/.claude/browser-profiles/reddit directory, one of them corrupts the user-data-dir and exits. S4L fixes that with roughly 70 lines of shell.

Matthew Diakonov, Written with AI

Published April 19, 202611 min read

See how S4L runs Jump to the lock stack

4.9from operators running 3+ products

37 launchd plists in production

Two-tier lock stack, self-healing

No Redis, no lock server, ~70 lines of shell

What a social media agent orchestrator actually has to do

Definition

A social media agent orchestrator is the process that decides which AI agent runs on which platform at which time, schedules each run as a separate process, and coordinates shared resources (browser profiles, OAuth sessions, database connections) so two agents never collide on the same account. In practice on a single machine, the orchestrator is less "meta-agent dispatching workers" and more "cron + a two-tier lock stack with crash recovery."

A social media agent orchestrator is the layer that turns "an LLM that can write posts and replies" into a system that consistently shows up on Reddit, Twitter/X, and GitHub without you babysitting it. The textbook definition (a meta-agent delegating to worker agents) is correct and incomplete. On a real machine, the orchestrator is responsible for six concrete jobs.

The orchestrator's actual responsibilities

Decide what to post or reply to next, on which platform, for which project
Schedule and fire the work as separate processes (launchd/cron), not threads in one daemon
Drive a real logged-in browser per platform, because most engagement APIs are gated or banned
Hold one shared resource (a browser profile, an OAuth session, a DB connection) without two pipelines stomping on it
Self-heal when a run crashes mid-comment, so the next scheduled tick is not blocked forever
Skip, not queue, when contention is high, so a busy hour does not turn into a 200-tab Chrome backlog

Most write-ups cover only the first two. The remaining four are where every real implementation gets stuck, and they are the subject of the rest of this guide.

S4L orchestrator, 30-second concept

What actually keeps 37 cron jobs from colliding

launchd fires 37 pipelines across 4 platforms

Every pipeline acquires its platform-browser lock first

Then its pipeline lock, stacked on top

/tmp/social-autoposter-*.lock is a directory with a pid file

Stale lock? kill -0 the pid, wipe, retry

0:00 / 0:05

The SERP version is missing the interesting part

Read the top five results for this keyword and you get the same summary. Worker agents write, schedule, monitor. A meta-agent delegates. Productivity goes up 20x. Buffer, Sprout, Optimizely, MindStudio, LangChain: nearly identical framing. None of them name a single real failure mode of running more than one agent at once.

Here is the one that eats a weekend: two agent loops start within the same minute, both decide the Reddit profile is the right place to be, and both try to launch Chrome against the same persistent userDataDir. The second one fails with SingletonLock or corrupts the profile. If it corrupts it, you lose cookies, the next scheduled run logs in as nobody, and Reddit rate-limits the login attempts. Now your orchestrator is ghosting the inbox.

This is an OS-level problem, not an LLM-level problem. You do not solve it with a better prompt. You solve it with a lock.

What the orchestrator actually looks like

Four different Reddit pipelines, all scheduled by launchd, all eventually needing to drive the same Chrome profile. The reddit-browser lock is the chokepoint every one of them has to pass through before it can talk to the profile, the playwright-mcp stdio server, or the database.

Every Reddit pipeline funnels through the reddit-browser lock

Four numbers that describe the orchestrator

0launchd plists coordinating

0browser profiles guarded

0lines of shell for the lock

0hstale-lock threshold (hours)

The lock stack, in full

This is skill/lock.sh. Every pipeline sources it. Every pipeline calls acquire_lock twice at the top. Nothing else is needed.

skill/lock.sh

The call site, verbatim

This is how skill/link-edit-reddit.sh opens. Two lines. Order is not negotiable.

skill/link-edit-reddit.sh

Two tiers, one rule

Every lock in the orchestrator falls into one of two categories. The rule is that the shared-resource tier is acquired first.

Platform-browser locks

One lock per browser profile. reddit-browser, twitter-browser, moltbook-browser, github. Shared across every pipeline that touches that platform.

Pipeline locks

One lock per launchd job. link-edit-reddit, dm-outreach-reddit, engage-twitter, audit-reddit-resurrect, and so on. Prevents a pipeline from overlapping with its own previous run.

Stacking rule

Browser-profile lock FIRST, pipeline lock SECOND. _SA_LOCK_DIRS tracks them in order. A single EXIT trap releases the whole stack, even on SIGINT or SIGTERM.

Staleness detection

If /tmp/.../pid is missing, or kill -0 $pid fails, or the lock dir is older than 10,800 seconds, the contender wipes it and retries. No cron will wedge the whole fleet.

Skip, do not queue

If the lock is still held after the timeout (default 3600s), the contender exits 0. launchd will retry on the next scheduled tick. You never end up with a backlog of half-finished Chrome tabs.

No flock, no Redis

The whole thing is mkdir + a pid file. Works on stock macOS and stock Linux. No extra daemon, no Redlock edge cases, no lock server to page on at 2 AM.

Every lock in the fleet

These are the actual acquire_lock call sites, dedup-sorted out of skill/*.sh. Four platform-browser locks at the top, pipeline locks below. Every pipeline picks one from each tier, in that order.

grep -h acquire_lock skill/*.sh | sort -u

A collision, played out in real time

launchd does not know about the lock. Two plists fire at the same clock minute and both pipelines start. Here is the log trail from a real collision in skill/logs/, rewritten for readability.

09:00 collision resolved

The same story as a sequence diagram

Two pipelines, one browser profile

What you need to copy, if you are building your own

The whole pattern is eight bullets. Everything else, including launchd plists, per-platform MCP configs, and the pipelines themselves, is plumbing around these.

Orchestrator-safety checklist

A single run is started by launchd, not by a long-lived daemon
Each run sources skill/lock.sh and calls acquire_lock at the top
The first acquire_lock call is always the browser-profile lock
The second acquire_lock call is the pipeline-specific lock
A held lock is a directory containing a pid file written with echo $$
A contending run polls every 10 seconds, up to its timeout, then exits 0
A stale lock (dead pid, missing pid file, or 3h+ old) is wiped and retried
Every lock held by this run is released by a single EXIT/INT/TERM/HUP trap

S4L vs. what you usually get

Most agent platforms handle concurrency by scaling horizontally and hoping the blast radius stays small. That works until the workers share state. Browser profiles share state.

Feature	Typical agent framework	S4L orchestrator
Multiple agents running concurrently	Usually yes, often without a safety net	Yes, as long as they touch different browser profiles
Two agents hit the same Reddit session	Both launch Chrome against the same user-data-dir, one crashes	One holds the reddit-browser lock, the other sleeps 10s and retries
Crash recovery	Manual lock file cleanup, or wait for the daemon to restart	PID-based staleness + 3h mtime fallback, next run auto-cleans
Backlog behavior	Queue runs, backlog compounds, Chrome tabs pile up	Skip the tick, let launchd fire again later
Infrastructure required	Redis, Zookeeper, job queue, scheduler daemon	mkdir + pid file + launchd, zero external services
Lines to read to fully understand it	Thousands across runner, broker, worker pool	About 70 (skill/lock.sh)

Frequently asked questions

Why do you need two levels of locks instead of one?

Two different pipelines can share a browser profile but not a task. On Reddit, engage.sh (replying to inbound comments), link-edit-reddit.sh (editing our top-performing comments to add a link), dm-outreach-reddit.sh (sending DMs), and scan-reddit-replies.sh (discovering new replies) all log in as the same account. They must take turns on the reddit-browser profile. But each pipeline also has to prevent overlap with its own previous run. The browser-profile lock handles cross-pipeline contention, the pipeline lock handles self-contention.

Why acquire the platform-browser lock BEFORE the pipeline lock?

If you acquire pipeline first and browser second, two different pipelines can each grab their pipeline lock, then both block on the same browser lock. Neither can exit, neither can release. You get a cross-pipeline deadlock. Acquiring the shared resource first means contention is resolved before anyone commits to a sub-lock. It is the same reason database transactions should lock parent rows before child rows.

What happens if a run crashes while holding a lock?

Three fallbacks catch it. First, the shell trap on EXIT/INT/TERM/HUP removes every lock directory in _SA_LOCK_DIRS. Second, the next contender checks the pid file and runs kill -0; a dead pid means the lock is removed. Third, any lock directory older than 10,800 seconds (3 hours) is wiped regardless. Together these mean the orchestrator self-heals, no cron job will wedge the whole fleet.

Why not just use flock?

flock is not installed by default on macOS, and the orchestrator has to run on both macOS and Linux. mkdir is an atomic syscall on every POSIX filesystem, so it works as a spin-lock primitive without any dependencies. The whole implementation is ~70 lines in skill/lock.sh. No extra package, no external service, no per-platform shim.

Why mkdir and not touch or a pid file alone?

mkdir fails atomically if the directory exists, which is exactly the primitive a lock needs. touch would succeed for every contender and race on the pid write. A pid file alone lets two contenders both write their pid and both think they own the lock. mkdir plus write-pid-inside gives you atomicity from the filesystem and a recoverable holder identity.

What is the contention cost in practice?

Contenders sleep 10 seconds between retries. The default timeout is 3600 seconds (1 hour), so a contender will poll up to 360 times. In practice launchd stagger plus a median run length under 15 minutes means most contenders wait zero seconds. When two jobs collide, the later one typically waits one or two polls, then proceeds.

Can I use this pattern outside social media automation?

Yes. The pattern is 'shared resource first, pipeline second, stacked trap-cleaned locks, mkdir plus pid staleness detection.' It applies anywhere you have multiple cron jobs or agents that share a stateful resource: a browser profile, a serial port, an OAuth refresh token, a non-reentrant CLI. The 70 lines in skill/lock.sh are deliberately generic so you can source them as-is.

Skip rebuilding this from scratch

S4L ships the orchestrator: the two-tier lock stack, 37 launchd plists, the per-platform MCP browser agents, and the SEO and outreach pipelines that ride on top. Open source, runs locally on your accounts and your LLM credits, no vendor middleman.

See how S4L runs →

Adjacent guides on the same engine

Keep reading

Architecture

Auto social media posting without platform APIs

Why the posting side runs on logged-in Chrome profiles instead of platform APIs. CDP attach, per-platform locks, 37 launchd jobs.

Read

Ranking

Social media auto posting that waits 5 minutes before it decides

The T0/T1 velocity loop. Snapshot engagement, sleep 300 seconds, snapshot again, rank candidates by delta_score instead of by a calendar.