S4L picks a comment style the same way a multi-armed bandit picks a slot machine.
S4L is a social autoposter. That is the headline. The part nobody else covers is the picker. Every time S4L drafts a comment, a 20-line Python function queries its Postgres posts table, averages upvotes per engagement style on that platform, and hands the LLM a fresh dominant / secondary / rare split. There is no static list of "good" styles. The list regenerates on every draft.
the uncopyable bit
The learner is _fetch_style_stats(platform), a single grouped SELECT.
Every other moving part of S4L is scaffolding around this one query. It returns an n and an avg_up per style on one platform. That is the entire learning signal. The rest of scripts/engagement_styles.py just sorts, tiers, and formats it into a prompt block.
The one SQL query that does the grading
Filters are important here. We only trust posts that are still active (not removed or deleted), that actually have content (30+ chars so tiny throwaways do not drag averages), and where upvotes came back as a real integer. The result is a small dict keyed by style name.
The loop, step by step
There is no training job, no offline scheduler, no separate model. Every new post both consumes the last policy and writes the row that will update the next one. The whole loop lives in the posting pipeline.
1. A platform pipeline fires
launchd kicks run-reddit-threads.sh / run-twitter-cycle.sh / run-linkedin.sh / etc. The shell acquires a per-platform lock and calls claude -p with the platform's browser MCP attached.
2. The agent calls get_styles_prompt(platform)
Inside that call, get_dynamic_tiers(platform) runs. That function is a thin orchestrator: grab stats, split by trust floor, sort the trusted ones, bucket into thirds.
3. _fetch_style_stats(platform) hits Postgres
Single grouped SELECT against the posts table, filtered to active rows with real content and real upvote counts. Returns {style_name: {n, avg_up}}.
4. Styles below n=5 are shunted into 'secondary'
This is the exploration budget. A style with 2 bad samples does not get demoted, it gets another 3 shots at convincing the bandit before it can be graded.
5. Trusted styles get tiered by avg_upvotes
Sorted descending. Top third -> dominant. Bottom third -> rare. Middle -> secondary (next to the explorers). If only 1-2 styles are trusted, all of them go into dominant and rare stays empty.
6. The LLM sees PRIMARY / SECONDARY / RARE
The prompt header names the target ratios (~60% / ~30% / ~10%). The LLM picks one style and drafts a comment to that style's rules (length, first-word constraints, markdown allowed or banned).
7. The post goes out, the row gets written
INSERT INTO posts(..., engagement_style, ...) VALUES (..., 'contrarian', ...). Every row carries its style. Next draft's get_dynamic_tiers call will see that row in the AVG.
Cold start vs warmed-up
Toggle between the two states below. Same platform, same candidate set, wildly different prompts. This is the bandit visibly learning.
reddit pipeline: same code, different state
No style has n>=5 samples, so nothing is trusted. get_dynamic_tiers returns (dominant=[], secondary=all_candidates_minus_never, rare=[]). The prompt only has a SECONDARY block. Every non-banned style has a roughly equal chance of being picked. The bandit is pure exploration, by design.
- trusted = []
- secondary = [critic, storyteller, pattern_recognizer, contrarian, data_point_drop, snarky_oneliner]
- reddit bans curious_probe via PLATFORM_POLICY, stripped before tiering
- prompt header says 'use SECONDARY ~30%' even though it's 100%
- expected: every style gets sampled until min_sample_size crossed
The 20-line function on top of that query
Thirds. Trust floor. Explore fallback. No scikit-learn, no vowpal wabbit, no model file. The whole policy fits on one screen.
why this constant matters
This is the single most load-bearing number in the learner. With it, the bandit protects itself from two bad posts tanking a style forever. Without it, the first run sets the winners and S4L locks in. Styles that have not reached 5 samples on a platform are tagged "explore" and keep getting sampled regardless of how their early avg_upvotes looks.
How the query becomes a prompt
The grouped SELECT on the left. The tiering function in the middle. Three labeled prompt blocks on the right, which is what the drafting LLM actually reads.
posts table -> get_dynamic_tiers -> labeled prompt blocks
What the LLM actually sees
This block is generated fresh per draft by the get_styles_prompt function. Notice the usage percentages. They are not suggestions, they are how the bandit communicates its policy.
One comment, one full round trip
Every comment S4L posts closes the loop once. The stats update pass a few hours later refreshes posts.upvotes so the next draft's SELECT reflects real reception, not the value at insert time.
draft -> post -> log -> stats -> next draft
What a live run looks like in the log
Abbreviated output from one twitter-cycle pass. The bandit reports its current tiers before the LLM sees the prompt, so every run is self-auditing.
The learner by the numbers
These are constants in the code, not marketing numbers. You can grep them in scripts/engagement_styles.py.
prompt target for PRIMARY tier
prompt target for SECONDARY + explore
prompt target for RARE tier
min our_content length (chars) before a post counts toward avg
Static taxonomy vs S4L's live bandit
Most writeups about AI autoposters describe a fixed list of 'what to post on which platform'. S4L treats that as a starting state that the DB then rewrites.
| Feature | Typical autoposter | S4L |
|---|---|---|
| Style tiers | Hard-coded per platform in a config file | Recomputed on every draft from live posts.avg_upvotes, per platform |
| Cold start | Falls back to hard-coded 'recommended' styles | All candidate styles forced into 'secondary' so every one gets sampled |
| Trust floor | None, a single unlucky post can demote a style | MIN_SAMPLE_SIZE=5, styles with n<5 stay in 'explore' regardless of avg |
| Banned styles | Often enforced only in the prompt ('avoid X') | Stripped from the candidate set before the prompt is built |
| Pick ratio | Single pick, no exploration budget | 60% dominant / 30% secondary / 10% rare baked into the prompt |
| Per-post audit | Style rarely logged, A/B unprovable | engagement_style column is NOT NULL on the posts table, every row tagged |
Point S4L at your product
Same 7-style taxonomy, same bandit, your handle. You hand it a content angle and a list of subreddits, it finds threads, picks a style against your own posts table, and logs every row for the next run to learn from.
See pricing →S4L: common questions
What is S4L?
S4L is a social autoposter. It finds threads on Reddit, X/Twitter, LinkedIn, GitHub issues, and Moltbook, drafts a comment under one of 7 named engagement styles, posts it via a browser MCP or an API, and logs every post to a Neon Postgres table. It runs as a Claude Code skill plus a set of macOS launchd jobs that fire the comment, engagement, and stats pipelines on a 6-hour cadence.
Where exactly does S4L learn which style wins?
In scripts/engagement_styles.py. The function _fetch_style_stats(platform) runs a single SQL query: SELECT engagement_style, COUNT(*), AVG(upvotes) FROM posts WHERE platform=%s AND status='active' AND LENGTH(our_content)>=30 GROUP BY engagement_style. The function get_dynamic_tiers(platform) consumes those rows, sorts trusted styles (n>=5) by avg_upvotes, and returns a (dominant, secondary, rare) tuple. That tuple is then formatted into the prompt block every pipeline reads before drafting.
Why does MIN_SAMPLE_SIZE=5 matter?
Without a sample floor, a single unlucky post could kick a style into 'rare' and kill its pick rate permanently. With 5 samples required before a style's avg_upvotes is trusted, the bandit is protected from early-run noise. Every style with n<5 is placed into secondary/explore even if its avg_upvotes looks low, so S4L keeps testing it. This is the difference between a learning loop and a prematurely-locked policy.
How is the 60/30/10 split enforced?
It is not a code-level quota, it is prompted. The get_styles_prompt function emits three labeled blocks: PRIMARY (~60% of the time), SECONDARY (~30%), RARE (~10%). The LLM sees the ratio in the header and picks accordingly. The actual pick ratio then gets logged as engagement_style on each row, so next run the SQL query will reflect the new distribution. The loop closes without any extra scheduler.
What is in the 'never' list, and why is it not in the bandit?
Each platform has a static never list in PLATFORM_POLICY. LinkedIn bans snarky_oneliner, GitHub bans snarky_oneliner, Reddit bans curious_probe. These exist because tone and brand safety are not performance judgments. Even if the DB showed snarky_oneliner getting high reactions on LinkedIn, S4L will not ship it. The never list is applied before get_dynamic_tiers even sees the candidate set, so those styles never enter the prompt at all.
What happens on the first run when posts is empty?
The function returns (dominant=[], secondary=all_candidates, rare=[]). The prompt then only has a SECONDARY block (minus the never list). Every style gets a roughly equal chance. After about 35 posts on a given platform (7 styles x 5 samples), enough styles cross the trust floor that the bandit can start producing a dominant tier. Before that, S4L is deliberately exploring.
Is this different from the page at /t/site-s4l-ai that describes the 7 styles?
Yes. That page describes the STYLES dict and the per-platform never list as a static taxonomy. This page describes the loop on top of that taxonomy: the SQL query that grades style performance, the get_dynamic_tiers function that turns rows into tiers, the MIN_SAMPLE_SIZE floor, and the 60/30/10 prompt split. Taxonomy is the vocabulary; this is the learning.
How often does the bandit re-rank?
Every time get_styles_prompt is called, which means every time S4L drafts a comment or a reply. There is no batch job. The SQL query runs live, and stats are updated roughly every 6 hours by the stats pipeline (com.m13v.social-stats launchd plist). So by the time you see a comment land, its style was picked against stats that are at most a few hours stale.