s4l.ai / guide

S4L picks a comment style the same way a multi-armed bandit picks a slot machine.

Q: What is S4L?

S4L is a social autoposter. It finds threads on Reddit, X/Twitter, GitHub issues, and Moltbook, drafts a comment under one of 7 named engagement styles, posts it via a browser MCP or an API, and logs every post to a Postgres table. It runs as a Claude Code skill plus a set of macOS launchd jobs that fire the comment, engagement, and stats pipelines on a 6-hour cadence.

Q: Where exactly does S4L learn which style wins?

In scripts/engagement_styles.py. The function _fetch_style_stats(platform) runs a single SQL query: SELECT engagement_style, COUNT(*), AVG(upvotes) FROM posts WHERE platform=%s AND status='active' AND LENGTH(our_content)>=30 GROUP BY engagement_style. The function get_dynamic_tiers(platform) consumes those rows, sorts trusted styles (n>=5) by avg_upvotes, and returns a (dominant, secondary, rare) tuple. That tuple is then formatted into the prompt block every pipeline reads before drafting.

Q: How is the 60/30/10 split enforced?

It is not a code-level quota, it is prompted. The get_styles_prompt function emits three labeled blocks: PRIMARY (~60% of the time), SECONDARY (~30%), RARE (~10%). The LLM sees the ratio in the header and picks accordingly. The actual pick ratio then gets logged as engagement_style on each row, so next run the SQL query will reflect the new distribution. The loop closes without any extra scheduler.

Q: What is in the 'never' list, and why is it not in the bandit?

Each platform has a static never list in PLATFORM_POLICY. bans snarky_oneliner, GitHub bans snarky_oneliner, Reddit bans curious_probe. These exist because tone and brand safety are not performance judgments. Even if the DB showed snarky_oneliner getting high reactions on, S4L will not ship it. The never list is applied before get_dynamic_tiers even sees the candidate set, so those styles never enter the prompt at all.

Q: What happens on the first run when posts is empty?

The function returns (dominant=[], secondary=all_candidates, rare=[]). The prompt then only has a SECONDARY block (minus the never list). Every style gets a roughly equal chance. After about 35 posts on a given platform (7 styles x 5 samples), enough styles cross the trust floor that the bandit can start producing a dominant tier. Before that, S4L is deliberately exploring.

Q: Is this different from the page at /t/site-s4l-ai that describes the 7 styles?

Yes. That page describes the STYLES dict and the per-platform never list as a static taxonomy. This page describes the loop on top of that taxonomy: the SQL query that grades style performance, the get_dynamic_tiers function that turns rows into tiers, the MIN_SAMPLE_SIZE floor, and the 60/30/10 prompt split. Taxonomy is the vocabulary; this is the learning.

Q: How often does the bandit re-rank?

Every time get_styles_prompt is called, which means every time S4L drafts a comment or a reply. There is no batch job. The SQL query runs live, and stats are updated roughly every 6 hours by the stats pipeline (com.m13v.social-stats launchd plist). So by the time you see a comment land, its style was picked against stats that are at most a few hours stale.

S4L is a social autoposter. That is the headline. The part nobody else covers is the picker. Every time S4L drafts a comment, a 20-line Python function queries its Postgres posts table, averages upvotes per engagement style on that platform, and hands the LLM a fresh dominant / secondary / rare split. There is no static list of "good" styles. The list regenerates on every draft.

See S4L pricing Jump to the loop

Matthew Diakonov, Written with AI

Published April 17, 202610 min read

4.9from 31

live SQL query against the posts table, not a config file

MIN_SAMPLE_SIZE = 5 before a style can be demoted

60 / 30 / 10 tier split enforced in the prompt itself

cold-start: every style becomes secondary so exploration runs

A bandit for comment styles

Why S4L re-ranks its voice on every single draft

7 named engagement styles, one picked per comment

Live SQL: AVG(upvotes) GROUP BY engagement_style per platform

Top third = dominant (~60%), bottom third = rare (~10%)

Any style with n<5 samples forced to keep exploring

Loop closes when the new row is written back to posts

0:00 / 0:05

the uncopyable bit

The learner is `_fetch_style_stats(platform)`, a single grouped SELECT.

Every other moving part of S4L is scaffolding around this one query. It returns an n and an avg_up per style on one platform. That is the entire learning signal. The rest of scripts/engagement_styles.py just sorts, tiers, and formats it into a prompt block.

The one SQL query that does the grading

Filters are important here. We only trust posts that are still active (not removed or deleted), that actually have content (30+ chars so tiny throwaways do not drag averages), and where upvotes came back as a real integer. The result is a small dict keyed by style name.

scripts/engagement_styles.py

The loop, step by step

There is no training job, no offline scheduler, no separate model. Every new post both consumes the last policy and writes the row that will update the next one. The whole loop lives in the posting pipeline.

1. A platform pipeline fires

launchd kicks run-reddit-threads.sh / run-twitter-cycle.sh / / etc. The shell acquires a per-platform lock and calls claude -p with the platform's browser MCP attached.

2. The agent calls get_styles_prompt(platform)

Inside that call, get_dynamic_tiers(platform) runs. That function is a thin orchestrator: grab stats, split by trust floor, sort the trusted ones, bucket into thirds.

3. _fetch_style_stats(platform) hits Postgres

Single grouped SELECT against the posts table, filtered to active rows with real content and real upvote counts. Returns {style_name: {n, avg_up}}.

4. Styles below n=5 are shunted into 'secondary'

This is the exploration budget. A style with 2 bad samples does not get demoted, it gets another 3 shots at convincing the bandit before it can be graded.

5. Trusted styles get tiered by avg_upvotes

Sorted descending. Top third -> dominant. Bottom third -> rare. Middle -> secondary (next to the explorers). If only 1-2 styles are trusted, all of them go into dominant and rare stays empty.

6. The LLM sees PRIMARY / SECONDARY / RARE

The prompt header names the target ratios (~60% / ~30% / ~10%). The LLM picks one style and drafts a comment to that style's rules (length, first-word constraints, markdown allowed or banned).

7. The post goes out, the row gets written

INSERT INTO posts(..., engagement_style, ...) VALUES (..., 'contrarian', ...). Every row carries its style. Next draft's get_dynamic_tiers call will see that row in the AVG.

Cold start vs warmed-up

Toggle between the two states below. Same platform, same candidate set, wildly different prompts. This is the bandit visibly learning.

reddit pipeline: same code, different state

No style has n>=5 samples, so nothing is trusted. get_dynamic_tiers returns (dominant=[], secondary=all_candidates_minus_never, rare=[]). The prompt only has a SECONDARY block. Every non-banned style has a roughly equal chance of being picked. The bandit is pure exploration, by design.

trusted = []
secondary = [critic, storyteller, pattern_recognizer, contrarian, data_point_drop, snarky_oneliner]
reddit bans curious_probe via PLATFORM_POLICY, stripped before tiering
prompt header says 'use SECONDARY ~30%' even though it's 100%
expected: every style gets sampled until min_sample_size crossed

The 20-line function on top of that query

Thirds. Trust floor. Explore fallback. No scikit-learn, no vowpal wabbit, no model file. The whole policy fits on one screen.

scripts/engagement_styles.py

why this constant matters

0minimum sample size before a style can be graded

This is the single most load-bearing number in the learner. With it, the bandit protects itself from two bad posts tanking a style forever. Without it, the first run sets the winners and S4L locks in. Styles that have not reached 5 samples on a platform are tagged "explore" and keep getting sampled regardless of how their early avg_upvotes looks.

How the query becomes a prompt

The grouped SELECT on the left. The tiering function in the middle. Three labeled prompt blocks on the right, which is what the drafting LLM actually reads.

posts table -> get_dynamic_tiers -> labeled prompt blocks

What the LLM actually sees

This block is generated fresh per draft by the get_styles_prompt function. Notice the usage percentages. They are not suggestions, they are how the bandit communicates its policy.

generated at runtime by get_styles_prompt('reddit')

One comment, one full round trip

Every comment S4L posts closes the loop once. The stats update pass a few hours later refreshes posts.upvotes so the next draft's SELECT reflects real reception, not the value at insert time.

draft -> post -> log -> stats -> next draft

What a live run looks like in the log

Abbreviated output from one twitter-cycle pass. The bandit reports its current tiers before the LLM sees the prompt, so every run is self-auditing.

run-twitter-cycle.sh, one iteration

The learner by the numbers

These are constants in the code, not marketing numbers. You can grep them in scripts/engagement_styles.py.

0named engagement styles

0MIN_SAMPLE_SIZE trust floor

0tiers in the prompt (primary / secondary / rare)

0platforms with their own bandit (reddit/twitter//github/moltbook)

prompt target for PRIMARY tier

prompt target for SECONDARY + explore

prompt target for RARE tier

min our_content length (chars) before a post counts toward avg

Static taxonomy vs S4L's live bandit

Most writeups about AI autoposters describe a fixed list of 'what to post on which platform'. S4L treats that as a starting state that the DB then rewrites.

Feature	Typical autoposter	S4L
Style tiers	Hard-coded per platform in a config file	Recomputed on every draft from live posts.avg_upvotes, per platform
Cold start	Falls back to hard-coded 'recommended' styles	All candidate styles forced into 'secondary' so every one gets sampled
Trust floor	None, a single unlucky post can demote a style	MIN_SAMPLE_SIZE=5, styles with n<5 stay in 'explore' regardless of avg
Banned styles	Often enforced only in the prompt ('avoid X')	Stripped from the candidate set before the prompt is built
Pick ratio	Single pick, no exploration budget	60% dominant / 30% secondary / 10% rare baked into the prompt
Per-post audit	Style rarely logged, A/B unprovable	engagement_style column is NOT NULL on the posts table, every row tagged

Point S4L at your product

Same 7-style taxonomy, same bandit, your handle. You hand it a content angle and a list of subreddits, it finds threads, picks a style against your own posts table, and logs every row for the next run to learn from.

See pricing →

S4L: common questions

What is S4L?

S4L is a social autoposter. It finds threads on Reddit, X/Twitter, GitHub issues, and Moltbook, drafts a comment under one of 7 named engagement styles, posts it via a browser MCP or an API, and logs every post to a Postgres table. It runs as a Claude Code skill plus a set of macOS launchd jobs that fire the comment, engagement, and stats pipelines on a 6-hour cadence.

Where exactly does S4L learn which style wins?

In scripts/engagement_styles.py. The function _fetch_style_stats(platform) runs a single SQL query: SELECT engagement_style, COUNT(*), AVG(upvotes) FROM posts WHERE platform=%s AND status='active' AND LENGTH(our_content)>=30 GROUP BY engagement_style. The function get_dynamic_tiers(platform) consumes those rows, sorts trusted styles (n>=5) by avg_upvotes, and returns a (dominant, secondary, rare) tuple. That tuple is then formatted into the prompt block every pipeline reads before drafting.

Why does MIN_SAMPLE_SIZE=5 matter?

Without a sample floor, a single unlucky post could kick a style into 'rare' and kill its pick rate permanently. With 5 samples required before a style's avg_upvotes is trusted, the bandit is protected from early-run noise. Every style with n<5 is placed into secondary/explore even if its avg_upvotes looks low, so S4L keeps testing it. This is the difference between a learning loop and a prematurely-locked policy.

How is the 60/30/10 split enforced?

It is not a code-level quota, it is prompted. The get_styles_prompt function emits three labeled blocks: PRIMARY (~60% of the time), SECONDARY (~30%), RARE (~10%). The LLM sees the ratio in the header and picks accordingly. The actual pick ratio then gets logged as engagement_style on each row, so next run the SQL query will reflect the new distribution. The loop closes without any extra scheduler.

What is in the 'never' list, and why is it not in the bandit?

Each platform has a static never list in PLATFORM_POLICY. bans snarky_oneliner, GitHub bans snarky_oneliner, Reddit bans curious_probe. These exist because tone and brand safety are not performance judgments. Even if the DB showed snarky_oneliner getting high reactions on, S4L will not ship it. The never list is applied before get_dynamic_tiers even sees the candidate set, so those styles never enter the prompt at all.

What happens on the first run when posts is empty?

The function returns (dominant=[], secondary=all_candidates, rare=[]). The prompt then only has a SECONDARY block (minus the never list). Every style gets a roughly equal chance. After about 35 posts on a given platform (7 styles x 5 samples), enough styles cross the trust floor that the bandit can start producing a dominant tier. Before that, S4L is deliberately exploring.

Is this different from the page at /t/site-s4l-ai that describes the 7 styles?

Yes. That page describes the STYLES dict and the per-platform never list as a static taxonomy. This page describes the loop on top of that taxonomy: the SQL query that grades style performance, the get_dynamic_tiers function that turns rows into tiers, the MIN_SAMPLE_SIZE floor, and the 60/30/10 prompt split. Taxonomy is the vocabulary; this is the learning.

How often does the bandit re-rank?

Every time get_styles_prompt is called, which means every time S4L drafts a comment or a reply. There is no batch job. The SQL query runs live, and stats are updated roughly every 6 hours by the stats pipeline (com.m13v.social-stats launchd plist). So by the time you see a comment land, its style was picked against stats that are at most a few hours stale.

S4L picks a comment style the same way a multi-armed bandit picks a slot machine.

The learner is _fetch_style_stats(platform), a single grouped SELECT.

The one SQL query that does the grading

The loop, step by step

1. A platform pipeline fires

2. The agent calls get_styles_prompt(platform)

3. _fetch_style_stats(platform) hits Postgres

4. Styles below n=5 are shunted into 'secondary'

5. Trusted styles get tiered by avg_upvotes

6. The LLM sees PRIMARY / SECONDARY / RARE

7. The post goes out, the row gets written

Cold start vs warmed-up

reddit pipeline: same code, different state

The 20-line function on top of that query

How the query becomes a prompt

posts table -> get_dynamic_tiers -> labeled prompt blocks

What the LLM actually sees

One comment, one full round trip

What a live run looks like in the log

The learner by the numbers

Static taxonomy vs S4L's live bandit

S4L: common questions

Comments (••)

The learner is `_fetch_style_stats(platform)`, a single grouped SELECT.

Comments ()