s4l.ai / field guide

How to announce a vLLM release on Reddit and X by never posting the announcement.

Every other playbook for this tells you to craft the perfect r/LocalLLaMA title, the perfect X thread, the perfect Show HN. This one is the opposite. It is how S4L distributes an open-weights release without ever writing a launch post: find the conversations already in motion, keep only the ones with five-minute momentum, and let a per-platform bandit pick which of seven voices gets to reply.

See the distribution loop Jump to the config

Matthew Diakonov, Written with AI

Published April 22, 202611 min read

4.9from 31

momentum filter: delta_score >= 1 over a 5-minute re-poll window

per-subreddit floor_days so the same repo cannot re-post r/LocalLLaMA for a week

rate-limit preflight aborts BEFORE Claude is spawned, not after

7-voice bandit re-ranks itself from live posts.avg_upvotes on every draft

Release distribution, reversed

What S4L does when an open-weights project ships

Skip the launch post. Reply to threads already talking.

Scrape candidates at t=0. Re-poll at t=+5m.

Drop every tweet whose engagement delta < 1.

A bandit picks the voice. Reddit bans curious_probe.

Every row logged so next run's bandit re-ranks live.

0:00 / 0:05

the anchor fact

The distribution gate is a `sleep 300` and a second fetch.

Open skill/run-twitter-cycle.sh. Phase one fetches candidate tweets. The shell sleeps five minutes. Phase two re-fetches the same IDs and computes a delta in engagement. Only tweets with delta_score >= 1 survive the filter. That is the whole momentum gate. It runs before the browser agent wakes up, before a single token is spent on drafting, and it is the reason S4L replies to vLLM-release threads that are trending at t=+5m rather than the ones that trended yesterday.

Why the launch-post move is the wrong move

An announcement post works once. You get a frontpage spike, a few hours of attention, and then the thread sinks. For a library release that has a twelve-month useful life (vLLM, a kernel, a tokenizer, an inference server), the lifetime distribution curve is a long tail, not a spike. Most of the interest comes from people who find the project three weeks later while solving a specific problem.

The launch post does not reach those people. It reaches the frontpage scrollers, once. The people asking "has anyone compared throughput at batch=32" a month in are in threads you did not start and cannot pin. S4L is built around the bet that the second group is where the actual adoption lives.

So the distribution job is reframed: find those threads, reply as a person who has opinions about the thing, and let a bandit figure out which voice works best per platform.

Three moves, one release

Each of these pipelines runs on its own launchd cadence and reads from the same posts table, so the bandit sees every platform in aggregate.

Reddit threads

run-reddit-threads.sh fires on a six-hour rotation. One project per run. Drafts a top-level submission using the project README as context, targets one subreddit from the external_subreddits list whose floor_days has expired. Voice is first-person, no links unless the subreddit rules allow them.

Reddit engagement

run-reddit-engage.sh searches recent threads matching the project's topical keywords. Drops any thread S4L already replied to. Picks a style from the reddit bandit (curious_probe is banned). Writes a comment that reads like a user, not a marketer.

Twitter cycle

run-twitter-cycle.sh fires every 20 minutes. Two-phase delta filter keeps only momentum. Direct product mentions are allowed but still graded by the bandit against posts.avg_upvotes. Replies, not announcements.

GitHub issues (optional)

post_github.py scans open issues in configured repos. Skips anything in exclusions.github_repos. Useful when a release fixes a specific bug someone already filed upstream.

The distribution loop, step by step

This is the Twitter cycle end to end. It is the pipeline that most directly answers the question of how to get a vLLM release in front of people on X without posting an announcement.

1. launchd fires run-twitter-cycle.sh

Every 20 minutes. The plist is com.m13v.social-twitter-cycle. StartInterval is 1200. The shell first acquires a per-platform flock so two cycles cannot collide.

2. Phase one: fetch candidate tweets (t=0)

scripts/fetch_twitter_t1.py runs the configured twitter_queries. Each query includes since:YESTERDAY so stale tweets are gone at the source. Results are snapshotted to disk with tweet_id plus current engagement counts.

3. sleep 300

The shell sleeps five minutes. This is the momentum window. It is not negotiable: shorter and you catch noise, longer and the conversation has drifted to the next release.

4. Phase two: re-fetch and score (t=+5m)

scripts/score_twitter_candidates.py re-fetches the same tweet IDs and computes delta_score = engagement_now minus engagement_at_t1. Tweets whose delta_score is below 1 are dropped.

5. Dedup against posts

scripts/enrich_twitter_candidates.py joins the surviving set against posts on (platform, thread_id) for this project. Any thread already replied to is removed.

6. The bandit picks the voice

engagement_styles.get_dynamic_tiers('twitter') runs a SQL against posts grouped by engagement_style. Top third is the PRIMARY tier for the prompt, bottom third is RARE. The LLM sees three labeled blocks with ~60/~30/~10 target ratios.

7. Draft, post, log

Claude plus twitter-agent MCP drafts one reply per candidate. Each post that lands writes a row to posts with engagement_style, platform, thread_id, and the raw text. The next cycle's bandit sees this row.

The actual two-phase shell

Abbreviated but functionally accurate. The sleep 300 is the whole point. Everything before it is data collection; everything after it is conditional on a measured engagement delta.

skill/run-twitter-cycle.sh

Where the momentum filter fits in the pipeline

Four sources feed the two-phase gate. One gate feeds three downstream consumers. The model never sees a tweet that did not cross the delta threshold.

sources -> momentum gate -> drafting

Reddit: the rate-limit preflight (run before Claude wakes up)

The failure mode every autoposter has seen: you spin up a drafting agent, burn tokens generating a great reply, try to post, and Reddit returns a 429. S4L catches that before the model is ever spawned. A cheap call against /r/popular.json?limit=1 reads the X-Ratelimit-Remaining header, caches it to disk, and exits cleanly if the quota is low.

scripts/post_reddit.py

the load-bearing constant

0seconds between the two tweet fetches

At 60 seconds most tweets have not moved. At 600 seconds the conversation has shifted and fresh tweets have replaced the ones in the snapshot. 300 is the sweet spot for a release-day Twitter cycle: long enough that a reply and a quote-tweet register as engagement gain, short enough that the responses land while the thread is still live.

The only config you actually touch for a release

Three sections. The subreddit list with per-subreddit cooldowns. The Twitter query list with the since:YESTERDAY operator so stale tweets never enter the candidate set. The exclusions block so the pipeline does not try to reply on the upstream repo itself.

config.json

The release loop by the numbers

These are constants in the code and config, not projections. They are what a release cycle actually runs on.

0ssleep between Twitter fetches

0slaunchd StartInterval for the cycle

0engagement styles in the bandit

0min delta_score to survive the filter

default floor_days on r/LocalLLaMA

default floor_days on r/MachineLearning

MIN_SAMPLE_SIZE before a style is graded

X-Ratelimit-Remaining threshold before preflight aborts

One release-day cycle in the log

Abbreviated output from a real run. Note that the pipeline logs its own filtering decisions, so every step of the gate is auditable after the fact.

run-twitter-cycle.sh, release day

Launch-post playbook vs S4L distribution loop

Same release, same subreddits, same Twitter audience. The difference is where the pipeline spends its token budget and what signal it trusts.

Feature	Standard launch playbook	S4L
Primary distribution move	Write a single announcement post, pin to r/LocalLLaMA	Reply to threads already discussing the topic, one voice per thread
Tweet selection	Pick by recency or follower count	Two-phase 5-minute re-poll, keep only tweets with delta_score >= 1
Subreddit cooldown	Manual memory, often violated	floor_days per subreddit in config.json, enforced before draft
Rate-limit handling	Retry on 429, burn tokens drafting posts that cannot ship	Preflight probe writes /tmp/reddit_ratelimit.json, aborts BEFORE Claude is spawned
Voice selection	One launch template, repeated across platforms	Per-platform bandit over 7 named styles, re-ranked from live posts.avg_upvotes
Self-disclosure	Obvious marketing voice, often flagged	First-person, subreddit-tone prompt; product mentions reserved for Twitter

Want this loop pointed at your next release?

Book 20 minutes and we will configure S4L against your repo, subreddit set, and Twitter queries live.

Questions about releasing on Reddit and X without a launch post

Why not just make a launch post on r/LocalLLaMA?

You can, but the mods and the room both route launch posts through a pattern they already recognize. The engagement curve on a cold announcement post is steep and short. S4L is built for the long tail: people asking 'has anyone compared vLLM throughput to TGI' three weeks after the release, or 'what is the right batch size for 70B on 2x H100' six months later. That is where a momentum-filtered reply outperforms a launch post by a lot over the lifetime of a release.

What does the 5-minute delta filter actually do?

The Twitter cycle script fetches candidate tweets at t=0 and writes engagement snapshots to disk. It sleeps 300 seconds, then re-fetches the same tweet IDs and computes delta_score = engagement_at_t1 minus engagement_at_t0. Any tweet with delta_score below 1 is dropped before the prompt is built. This selects for tweets gaining traction in real time and drops stale tweets that are coasting, which matters for an open-weights release because the conversation moves fast in the first 48 hours.

How does S4L stop posting to the same subreddit over and over?

Each project in config.json lists external_subreddits with a per-subreddit floor_days field. Before drafting, S4L queries its posts table for the most recent post against (project, subreddit). If now minus last_posted_at is less than floor_days, the subreddit is skipped for this cycle. r/LocalLLaMA commonly sits at 7 days; r/MachineLearning sits higher at 14. The floor is per-project, so a different project can still post the same week.

Can S4L post an actual text post to a subreddit, or is it comments only?

Both. run-reddit-threads.sh drafts a top-level submission (title plus body) with the project README as context, using the external_subreddits list. run-reddit-engage.sh drafts comments inside threads that match the project's topical queries. For a release, the comment pipeline is usually what you want because it reaches people asking questions right now rather than people scrolling the frontpage.

What are the 7 engagement styles and where are they defined?

They live in scripts/engagement_styles.py under a STYLES dict. The names are contrarian, storyteller, critic, pattern_recognizer, data_point_drop, snarky_oneliner, and curious_probe. Each style has its own rules: length budget, first-word constraints, whether markdown is allowed, which subreddits it matches. Reddit bans curious_probe at the platform level bans snarky_oneliner. The rest are ranked by average upvotes on each platform, fresh from Postgres, on every single draft.

Does S4L mention the product name in replies?

On Reddit, almost never. The Reddit prompt explicitly discourages pivoting into a product pitch and favors subreddit-native voice. On Twitter, first-person mentions are allowed but the reply still has to stand on its own as a useful observation; the mention is a byline, not a CTA. On GitHub issues, mentions are fine when the product directly addresses the issue. The logic sits in prompts/*.md and the per-platform style policies.

What prevents S4L from flooding a trending thread with three replies in a row?

The posts table has a unique constraint on (platform, thread_id) for comments, and the candidate enrichment step runs a dedup against it before scoring. If S4L already replied in a thread once, that thread is dropped from the candidate set for that project. Scanning back across a release launch window, the same thread can never receive two replies from the same project even if it keeps trending.

What should I configure first if I want to use S4L for an open-weights release?

Three things. First, the external_subreddits list with floor_days calibrated per community. Second, the twitter_queries list using the since:YESTERDAY operator so stale tweets get excluded upfront. Third, the exclusions.github_repos list so S4L does not comment on the upstream repo the release is a fork or dependency of. After that, fire run-twitter-cycle.sh manually once, read the /tmp/reddit_ratelimit.json output, and check your first drafts before letting launchd take over on its 20-minute cadence.