Guide, orchestration internals

A social media agent orchestrator is really a lock manager with good taste in what runs when.

Every article about "social media agent orchestrators" talks about one meta-agent delegating to worker agents. That framing is fine until you run it on a real machine. The moment two Claude sessions launch Chrome against the same ~/.claude/browser-profiles/reddit directory, one of them corrupts the user-data-dir and exits. S4L fixes that with roughly 70 lines of shell.

S
S4L
11 min read
4.9from operators running 3+ products
37 launchd plists in production
Two-tier lock stack, self-healing
No Redis, no lock server, ~70 lines of shell

The SERP version is missing the interesting part

Read the top five results for this keyword and you get the same summary. Worker agents write, schedule, monitor. A meta-agent delegates. Productivity goes up 20x. Buffer, Sprout, Optimizely, MindStudio, LangChain: nearly identical framing. None of them name a single real failure mode of running more than one agent at once.

Here is the one that eats a weekend: two agent loops start within the same minute, both decide the Reddit profile is the right place to be, and both try to launch Chrome against the same persistent userDataDir. The second one fails with SingletonLock or corrupts the profile. If it corrupts it, you lose cookies, the next scheduled run logs in as nobody, and Reddit rate-limits the login attempts. Now your orchestrator is ghosting the inbox.

This is an OS-level problem, not an LLM-level problem. You do not solve it with a better prompt. You solve it with a lock.

What the orchestrator actually looks like

Four different Reddit pipelines, all scheduled by launchd, all eventually needing to drive the same Chrome profile. The reddit-browser lock is the chokepoint every one of them has to pass through before it can talk to the profile, the playwright-mcp stdio server, or the database.

Every Reddit pipeline funnels through the reddit-browser lock

engage.sh
link-edit-reddit.sh
dm-outreach-reddit.sh
scan-reddit-replies.sh
reddit-browser lock
Reddit profile
reddit-agent MCP
posts / replies DB

Four numbers that describe the orchestrator

0launchd plists coordinating
0browser profiles guarded
0lines of shell for the lock
0hstale-lock threshold (hours)

The lock stack, in full

This is skill/lock.sh. Every pipeline sources it. Every pipeline calls acquire_lock twice at the top. Nothing else is needed.

skill/lock.sh

The call site, verbatim

This is how skill/link-edit-reddit.sh opens. Two lines. Order is not negotiable.

skill/link-edit-reddit.sh

Two tiers, one rule

Every lock in the orchestrator falls into one of two categories. The rule is that the shared-resource tier is acquired first.

Platform-browser locks

One lock per browser profile. reddit-browser, twitter-browser, linkedin-browser, moltbook-browser, github. Shared across every pipeline that touches that platform.

Pipeline locks

One lock per launchd job. link-edit-reddit, dm-outreach-reddit, engage-twitter, audit-linkedin, and so on. Prevents a pipeline from overlapping with its own previous run.

Stacking rule

Browser-profile lock FIRST, pipeline lock SECOND. _SA_LOCK_DIRS tracks them in order. A single EXIT trap releases the whole stack, even on SIGINT or SIGTERM.

Staleness detection

If /tmp/.../pid is missing, or kill -0 $pid fails, or the lock dir is older than 10,800 seconds, the contender wipes it and retries. No cron will wedge the whole fleet.

Skip, do not queue

If the lock is still held after the timeout (default 3600s), the contender exits 0. launchd will retry on the next scheduled tick. You never end up with a backlog of half-finished Chrome tabs.

No flock, no Redis

The whole thing is mkdir + a pid file. Works on stock macOS and stock Linux. No extra daemon, no Redlock edge cases, no lock server to page on at 2 AM.

Every lock in the fleet

These are the actual acquire_lock call sites, dedup-sorted out of skill/*.sh. Five platform-browser locks at the top, pipeline locks below. Every pipeline picks one from each tier, in that order.

grep -h acquire_lock skill/*.sh | sort -u

A collision, played out in real time

launchd does not know about the lock. Two plists fire at the same clock minute and both pipelines start. Here is the log trail from a real collision in skill/logs/, rewritten for readability.

09:00 collision resolved

The same story as a sequence diagram

Two pipelines, one browser profile

launchdengage.shlink-edit-reddit.sh/tmp/*.lockClaude + MCPfire at 09:00mkdir reddit-browser.lockOK, write pid 47211fire at 09:00mkdir reddit-browser.lockEEXIST, holder alivesleep 10s, retrylaunch Chrome, edit commentdonetrap EXIT: rm -rf lock dirmkdir OK, proceeds

What you need to copy, if you are building your own

The whole pattern is eight bullets. Everything else, including launchd plists, per-platform MCP configs, and the pipelines themselves, is plumbing around these.

Orchestrator-safety checklist

  • A single run is started by launchd, not by a long-lived daemon
  • Each run sources skill/lock.sh and calls acquire_lock at the top
  • The first acquire_lock call is always the browser-profile lock
  • The second acquire_lock call is the pipeline-specific lock
  • A held lock is a directory containing a pid file written with echo $$
  • A contending run polls every 10 seconds, up to its timeout, then exits 0
  • A stale lock (dead pid, missing pid file, or 3h+ old) is wiped and retried
  • Every lock held by this run is released by a single EXIT/INT/TERM/HUP trap

S4L vs. what you usually get

Most agent platforms handle concurrency by scaling horizontally and hoping the blast radius stays small. That works until the workers share state. Browser profiles share state.

FeatureTypical agent frameworkS4L orchestrator
Multiple agents running concurrentlyUsually yes, often without a safety netYes, as long as they touch different browser profiles
Two agents hit the same Reddit sessionBoth launch Chrome against the same user-data-dir, one crashesOne holds the reddit-browser lock, the other sleeps 10s and retries
Crash recoveryManual lock file cleanup, or wait for the daemon to restartPID-based staleness + 3h mtime fallback, next run auto-cleans
Backlog behaviorQueue runs, backlog compounds, Chrome tabs pile upSkip the tick, let launchd fire again later
Infrastructure requiredRedis, Zookeeper, job queue, scheduler daemonmkdir + pid file + launchd, zero external services
Lines to read to fully understand itThousands across runner, broker, worker poolAbout 70 (skill/lock.sh)

Frequently asked questions

Why do you need two levels of locks instead of one?

Two different pipelines can share a browser profile but not a task. On Reddit, engage.sh (replying to inbound comments), link-edit-reddit.sh (editing our top-performing comments to add a link), dm-outreach-reddit.sh (sending DMs), and scan-reddit-replies.sh (discovering new replies) all log in as the same account. They must take turns on the reddit-browser profile. But each pipeline also has to prevent overlap with its own previous run. The browser-profile lock handles cross-pipeline contention, the pipeline lock handles self-contention.

Why acquire the platform-browser lock BEFORE the pipeline lock?

If you acquire pipeline first and browser second, two different pipelines can each grab their pipeline lock, then both block on the same browser lock. Neither can exit, neither can release. You get a cross-pipeline deadlock. Acquiring the shared resource first means contention is resolved before anyone commits to a sub-lock. It is the same reason database transactions should lock parent rows before child rows.

What happens if a run crashes while holding a lock?

Three fallbacks catch it. First, the shell trap on EXIT/INT/TERM/HUP removes every lock directory in _SA_LOCK_DIRS. Second, the next contender checks the pid file and runs kill -0; a dead pid means the lock is removed. Third, any lock directory older than 10,800 seconds (3 hours) is wiped regardless. Together these mean the orchestrator self-heals, no cron job will wedge the whole fleet.

Why not just use flock?

flock is not installed by default on macOS, and the orchestrator has to run on both macOS and Linux. mkdir is an atomic syscall on every POSIX filesystem, so it works as a spin-lock primitive without any dependencies. The whole implementation is ~70 lines in skill/lock.sh. No extra package, no external service, no per-platform shim.

Why mkdir and not touch or a pid file alone?

mkdir fails atomically if the directory exists, which is exactly the primitive a lock needs. touch would succeed for every contender and race on the pid write. A pid file alone lets two contenders both write their pid and both think they own the lock. mkdir plus write-pid-inside gives you atomicity from the filesystem and a recoverable holder identity.

What is the contention cost in practice?

Contenders sleep 10 seconds between retries. The default timeout is 3600 seconds (1 hour), so a contender will poll up to 360 times. In practice launchd stagger plus a median run length under 15 minutes means most contenders wait zero seconds. When two jobs collide, the later one typically waits one or two polls, then proceeds.

Can I use this pattern outside social media automation?

Yes. The pattern is 'shared resource first, pipeline second, stacked trap-cleaned locks, mkdir plus pid staleness detection.' It applies anywhere you have multiple cron jobs or agents that share a stateful resource: a browser profile, a serial port, an OAuth refresh token, a non-reentrant CLI. The 70 lines in skill/lock.sh are deliberately generic so you can source them as-is.

Want an orchestrator that already has this wired up?

S4L is open about how it runs. The lock stack, the launchd plists, and the per-platform MCP configs are the product. Point it at your projects, let it post, reply, and link-edit on its own cadence.

See S4L