Social media automation software that learns which comment style wins
The listicles for this keyword all measure automation by the same grid: post calendar, AI caption, unified inbox, analytics rollup. S4L is measured by a question none of them ask. When the loop is about to write a comment, which of seven engagement styles should it speak in, on this specific platform, given how past comments performed? The answer lives in a single column called posts.engagement_style and a SQL query that rewrites the next prompt.
Most "automation software" automates the calendar, not the voice
The top pages for this keyword rotate through the same cast. They differ on seat pricing and which inbox view they ship. Every single one treats the body of the reply as a single prompt or a single template, and none of them measures whether the voice that prompt produces is actually what this platform rewards.
S4L sits in a different category. It does not hand you a content calendar. It hands you a generator that picks one of seven styles for each reply, writes that choice back to a Postgres row, and re-ranks the styles before the next comment is written.
The seven styles, exactly as shipped
Each style has a description, an example line, a note that narrows when it fires, and a best-in map per platform. The taxonomy lives in the STYLES dict at the top of scripts/engagement_styles.py. It is not tuned by the model, it is tuned by whoever ships the repo.
critic
Point out what is missing, flawed, or naive. Reframe the problem. 'The part that breaks down is...' Note: never just nitpick, offer a non-obvious insight.
storyteller
Pure first-person narrative with specific details (numbers, dates, names). Lead with failure or surprise. 'we tracked this for six months and found...' Note: never pivot to a product pitch.
pattern_recognizer
Name the pattern or phenomenon. Authority through pattern recognition, not credentials. 'This is called X. I have seen this play out dozens of times across Y.'
curious_probe
One specific follow-up question about the most interesting detail. Include 'curious because...' context. Banned on Reddit per PLATFORM_POLICY.
contrarian
Take a clear opposing position backed by experience. 'Everyone recommends X. I have done X for Y years and it is wrong.' Must have credible evidence. Empty hot takes get destroyed.
data_point_drop
Share one specific, believable metric. Let the number do the talking. '$12k in a month (not a lot of money).' No links. Numbers must be believable, not impressive.
snarky_oneliner
Short, sharp, emotionally resonant observation (one sentence max). Validates a shared frustration. Banned on LinkedIn and GitHub. Never in small or serious subs.
The anchor fact: one SQL query ranks the styles
Before every prompt, the generator calls _fetch_style_stats(platform). That function runs the aggregation below against the Neon Postgres instance backing the repo. N is the number of posts logged with that style on that platform. avg_up is their average final upvote count.
SELECT engagement_style, COUNT(*) AS n, AVG(COALESCE(upvotes,0))::float AS avg_up FROM posts WHERE status='active' AND LENGTH(our_content) >= 30 AND platform = %s GROUP BY engagement_style
The 30-character filter drops drafts and placeholder rows so they cannot poison the ranking. The per-platform filter means each platform is graded against its own history. A style that lost on Reddit can still be PRIMARY on LinkedIn.
The feedback loop, end to end
Every reply flows through the same pipeline. The generator picks a style, the row lands in the posts table, the upvote count fills in later, and the next prompt reads the aggregation before choosing again.
style feedback pipeline
The tier split, not hand-wavy
Trusted styles get sorted by avg upvotes, then sliced into thirds. The top third is PRIMARY, the bottom is RARE, and everything else (plus any untrusted styles with N under 5) becomes SECONDARY. Two edge cases are handled explicitly.
The numbers that steer the generator
Every constant below is a hard-coded value in engagement_styles.py. Changing any of them changes the voice distribution across the whole pipeline.
Tone policy overrides the data
Per-platform rules are not tuned by performance. They are hard coded because LinkedIn is not Reddit and no amount of past upvote data will change that. A style with the best avg_up on LinkedIn still gets dropped if it is in the never list.
| Feature | never (hard filter) | note (tone hint) |
|---|---|---|
| never: curious_probe | note: short wins, start with 'I' or 'my' | |
| never: (none) | note: brevity wins, direct mentions OK, 1-2 sentences | |
| never: snarky_oneliner | note: professional, softer critic, 2-4 sentences | |
| github | never: snarky_oneliner | note: technical and specific, 400-600 chars |
| moltbook | never: (none) | note: agent voice 'my human', conversational |
What this replaces
The typical setup for "social media automation software" is one system prompt and one voice. Every reply sounds similar because the generator has nothing to choose between. S4L trades that for a generator that is handed a tiered menu every single time.
One system prompt, one voice. Every reply has the same cadence. Upvote data flows into an analytics panel a human reads once a week. The generator never sees it.
- Single prompt, no voice variation
- Upvotes are a dashboard, not a prompt input
- No tone policy per platform
- No cold-start exploration
One cycle, step by step
The loop is six stages. The one that most tools skip is stage four: the SQL that turns yesterday's upvotes into today's prompt.
Generator picks a style
Before any text is written, the prompt shows a tiered style menu (PRIMARY / SECONDARY / RARE) with a target percentage. The model picks one and writes the reply.
Write the row
After the reply posts, reply_db.py inserts a row into the posts table with platform, our_content, engagement_style, upvotes (initially null), and status='active'.
Engagement updater fills upvotes
A separate scan loop re-polls each of our posts on its platform and UPDATEs the upvotes column. The longer a row sits, the closer its upvotes track final performance.
Next prompt runs the SQL
When the next post or reply is about to be drafted, get_styles_prompt(platform) calls _fetch_style_stats, aggregates by engagement_style, and re-splits the tiers.
Never-rules override everything
Right before the prompt is written, PLATFORM_POLICY.never filters the list. High-performing snarky_oneliner still gets dropped on LinkedIn. The data can suggest, it cannot overrule tone.
The loop keeps shifting weight
As more rows land with real upvote counts, the top-third shifts. A style that was RARE two weeks ago can drift into PRIMARY when the new posts outperform. Nothing is pinned.
What the prompt actually receives
The string below is exactly what get_styles_prompt("reddit") returns after a recent run. It embeds verbatim into the system prompt before the generator is asked to write a reply. The tier counts and the "use these X% of the time" language are what the model sees.
Why this matters more than another calendar
A calendar answers "when do I post." The harder question, and the one that decides whether a reply gets read or ignored, is "in what voice." Five reasons this is the missing piece.
Why a style feedback loop beats a single prompt
- A single GPT system prompt is a single voice. Repeated a hundred times, it is a tell.
- Different threads reward different voices. 'r/ExperiencedDevs' rewards pattern_recognizer. 'r/Meditation' rewards storyteller.
- The tool that decides what to write should also remember what worked when it wrote before.
- A closed loop means the voice composition drifts toward whichever style is currently winning on that specific platform.
- Tone policy still overrides the data. Performance cannot turn LinkedIn into Reddit.
S4L vs a calendar-first automation tool
Honest feature-by-feature comparison. A team that already runs a scheduler does not have to rip it out, they add this loop next to it for the comment side of the workflow.
| Feature | Calendar-first scheduler | S4L (engagement_styles.py) |
|---|---|---|
| How the comment voice is chosen | One system prompt. Every reply sounds the same. | 7 styles ranked by posts.engagement_style × avg upvotes |
| Does the tool remember what worked? | No. Engagement is an analytics dashboard, not a prompt input. | Yes. posts.engagement_style column, rewritten into every prompt. |
| Per-platform tone policy | One tone setting global to the workspace. | PLATFORM_POLICY hard-coded never-lists (e.g. no snark on LinkedIn). |
| Cold-start behavior | No notion of exploration vs exploitation. | All non-banned styles go to SECONDARY until 5 samples exist. |
| What column decides the next voice | No equivalent column. | posts.engagement_style (indexed, N=count, avg_up=float) |
| Tier percentages written into the prompt | Not represented | PRIMARY ~60%, SECONDARY ~30%, RARE ~10% |
| Can you grep the rule for why a voice was picked? | You cannot. | Yes. engagement_styles.py, get_dynamic_tiers, plus the posts row. |
The tool that writes the comment also remembers, per platform, which kind of comment has been working.
It is a small design choice. It is also the one every top-ten listicle on this keyword skips entirely.
See posts.engagement_style on a live account
30 minutes on Cal. Screen share of the posts table, a live run of get_styles_prompt, and the actual tier block the generator is receiving right now.
Book a call →Questions people ask before the first call
What actually makes this different from Hootsuite, Buffer, or SocialBee?
Those tools schedule posts on a calendar and, at best, let an AI draft a caption from a single prompt. S4L tags every reply it sends with one of seven style labels (critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner) and writes that label into posts.engagement_style. Every new prompt queries the posts table, aggregates avg upvotes per style per platform, and rebuilds the style tier block. None of the top listicles describe a feedback loop on the voice of the comment.
Where is the SQL that ranks styles?
scripts/engagement_styles.py, function _fetch_style_stats. The query is SELECT engagement_style, COUNT(*) AS n, AVG(COALESCE(upvotes,0))::float AS avg_up FROM posts WHERE status='active' AND engagement_style IS NOT NULL AND our_content IS NOT NULL AND LENGTH(our_content) >= 30 AND upvotes IS NOT NULL AND platform = %s GROUP BY engagement_style. It runs on a Neon Postgres instance. The 30-character filter drops draft and placeholder rows from the ranking.
Why 5 samples before a style is trusted?
MIN_SAMPLE_SIZE is set to 5 in engagement_styles.py. Below that, one outlier tweet with 400 upvotes can catapult a style into the PRIMARY tier and starve the others. Styles with fewer than 5 logged posts land in the SECONDARY pool, which the prompt describes as 'mid performers or untested, use these roughly 30% of the time,' so they still get explored without steering the whole batch.
What are the per-platform style bans?
PLATFORM_POLICY hard-codes these: curious_probe is banned on Reddit (reddit comment culture eats weak questions), snarky_oneliner is banned on LinkedIn and GitHub (wrong register for both audiences). These overrides fire regardless of how well a style performed in the aggregation. Tone policy beats data.
How does the tier split work in practice?
get_dynamic_tiers sorts trusted styles by avg_up descending, then slices into thirds. The top third becomes PRIMARY (labeled in the prompt as 'top performers by avg upvotes, use these ~60% of the time'), the middle plus all untrusted styles becomes SECONDARY (~30%), and the bottom third becomes RARE (~10%). If there are only one or two trusted styles, both go to PRIMARY and the RARE tier is empty. Cold start (no trusted styles yet) puts everything in SECONDARY.
Does this work for anything besides Reddit?
Yes. The PLATFORM_POLICY dict has entries for reddit, twitter, linkedin, github, and moltbook, each with its own never-list and tone note. The STYLES dict also marks which subreddits or topic buckets fit each style best. For example, data_point_drop is best in r/Entrepreneur and r/SaaS on Reddit, but on LinkedIn it maps to 'results, case studies' instead.
Does it ever ignore the data?
Three cases. First, per-platform bans in PLATFORM_POLICY.never always override the ranking. Second, if a style has N < 5 samples it is forced into SECONDARY even if its noisy avg_up is high. Third, a recommendation style is only ever available in reply contexts, governed by a separate Tier 1/2/3 link-use rule in the surrounding prompt rather than by the upvote feedback loop.
What is a 'style' in text, roughly?
A style is a voice pattern plus a note. storyteller means 'pure first-person narrative with specific details, lead with failure not success, never pivot to a product pitch.' contrarian means 'take a clear opposing position backed by experience, must have credible evidence, empty hot takes get destroyed.' snarky_oneliner means 'short, sharp, emotionally resonant observation, one sentence max.' The generator picks one per reply; the row in posts.engagement_style remembers which.