The best social media automation tools on the RecurPost listicles all ship the same thing. S4L ships the one piece they don't.

Every alternative on a RecurPost roundup is a scheduler. They queue content you wrote, rotate it on a cadence, and call that automation. None of them read their own engagement back before writing the next post. S4L does, and the whole thing lives in one file: scripts/top_performers.py.

M
Matthew Diakonov
9 min read
4.8from operators running S4L on live accounts
Feedback report generated from Neon Postgres before every reply draft
Comments weighted 3x more than upvotes in the scoring query
Every bottom performer annotated with a specific FAILURE REASON label

What RecurPost-style tools automate, in one sentence

You sit down once, fill a library with posts you want circulating, assign them to queue categories, pick a cadence per category, and the tool rotates them into the calendar forever. The best ones add AI captions, an RSS feed importer, and bulk scheduling from a CSV. The unit of work is the post you wrote. The job of the tool is the rotation.

That mental model is fine when the content is yours and evergreen. It breaks down the moment the content is a reply to a stranger. Nobody can prewrite a library of replies to threads that don't exist yet. And nothing in the RecurPost category scores the replies it has already written to decide what to do differently on the next one.

The piece S4L ships that none of them do

The piece is one Python file, scripts/top_performers.py, and the spine of it is a SQL expression defined at lines 30 to 35.

scripts/top_performers.py

Comments are tripled because a reply takes effort. Upvotes get counted at face value except on Reddit, where the OP's own self-upvote gets subtracted so a post sitting at score=1 with zero engagement correctly reads as zero. Views are nowhere in the formula, because viral-by-algorithm is not a pattern worth copying. Every downstream query (get_top_posts, get_bottom_posts, get_style_performance) reuses this expression as a single source of truth.

reddit=10twitter=5linkedin=3moltbook=3github=3comments x 3upvotes -1 (Reddit)views excludedSCORE_SQL = 0 -> bottomhas_anti_pattern filter

What makes this feedback loop uncopyable

Composite score, not raw upvotes

Comments weight 3x because a reply is real discussion, an upvote is a click. Views excluded entirely; viral-by-algorithm is not a pattern worth copying.

Reddit upvotes -1

GREATEST(0, upvotes - 1) subtracts the OP's own self-upvote so a post sitting at score=1 with zero replies correctly reads as zero engagement.

Per-platform floors

reddit=10, twitter=5, linkedin=3, moltbook=3, github=3. LinkedIn reactions are scarce, Reddit upvotes inflate, so the same composite gets graded differently.

FAILURE REASON on every bottom post

annotate_failure() tags the exact pattern that killed each post: 'contains URL/link', 'curious_probe on Reddit', 'mentions own project', 'too many questions'. The LLM reads the tag and avoids the pattern.

3,000 posts of history

The protected comment block at lines 54-59 cites the sample size: 'data-driven improvements based on analysis of 3,000+ posts'. The revert count, also documented, is 2.

Feedback report pipes into the draft

top_performers.py is called by every platform's run-*.sh before the browser MCP even spins up. The output becomes the system prompt for the Claude subprocess that writes the next comment.

Where the report ends up (hint: the LLM's system prompt)

The scoring query is only useful because it feeds a specific process. Every per-platform cron calls top_performers.py before a browser even opens, then glues the output into the context of a child Claude subprocess that writes the reply.

Where SCORE_SQL ends up on every cron tick

posts table
engagement_style stats
project weights
top_performers.py
Reddit reply
Twitter reply
LinkedIn reply

What happens on every single cron tick

1

launchd fires the per-platform cron

Example: com.m13v.social-twitter-cycle wakes up every six hours. It runs skill/run-twitter-cycle.sh.

2

run-*.sh builds context

The shell wrapper sources .env, acquires a per-platform lock, picks a project with the weight-deficit algorithm in pick_project.py, then calls scripts/top_performers.py --platform twitter --project <picked>.

3

top_performers.py runs SCORE_SQL

The composite score query pulls the top 5 posts above the platform floor and the bottom 5 posts sitting at zero engagement. annotate_failure() stamps each bottom post with the likely failure pattern.

4

The report becomes the system prompt

run-*.sh spawns a child `claude -p --strict-mcp-config --mcp-config twitter-agent-mcp.json` with the report and the style tier report glued into the instructions. No scheduler in the RecurPost category does this step.

5

Claude drafts, posts, logs back to Postgres

The comment is written knowing which patterns earned comments and which ones got stamped with a FAILURE REASON last cycle. Twelve hours later, scan_*.py reads engagement back into the same posts table and the loop closes.

The bottom-post annotator (the part nobody copies because it's opinionated)

Most scoring systems stop at ranking the winners. This one spends as much code on the losers. Every bottom post runs through annotate_failure which tags the specific pattern that likely killed it, so the LLM reads a do-not-copy set with reasons, not just text.

scripts/top_performers.py

The failure-gate you might miss on first read

A subtle but load-bearing detail lives in get_bottom_posts. The failure threshold is SCORE_SQL = 0, not upvotes < 1. The older, naive filter would have missed every Reddit post sitting at upvotes=1 with no comments, because the OP self-upvote clears that check. Switching to the composite closes the loop.

scripts/top_performers.py

What the report actually looks like by the time it hits Claude

Here is a redacted sample of the report that top_performers.py --platform reddit --project S4L produces right before a reply draft. Style stats come from the live posts table, the top posts are ranked by SCORE_SQL, and the bottom posts are stamped with their FAILURE REASON.

feedback_report.txt
0xComments weight over upvotes
0Reddit min-score floor
0LinkedIn min-score floor
~0+Posts the heuristics were tuned against

S4L vs a RecurPost-style scheduler on the parts that matter

FeatureRecurPost-style schedulerS4L
What does 'best post' mean?Whatever you manually marked to recycle in an evergreen library.Whatever ranked highest on SCORE_SQL last time top_performers.py ran.
Score formulaNone. Posts rotate on a fixed schedule regardless of performance.comments x 3 + upvotes, Reddit upvotes -1 to strip OP self-upvote.
Platform-aware thresholdsSame rules for Twitter, LinkedIn, Reddit. The library does not care.reddit=10, twitter=5, linkedin=3. Different reaction floors per platform.
Bottom performersMarked 'inactive' in the library and forgotten. No reason recorded.Fed to the LLM with an annotate_failure label so the next draft steers around the same failure mode.
Self-promotion filterNone. The scheduler will keep recycling a promo-heavy caption forever.has_anti_pattern() strips top posts containing product names or URLs before they become examples.
Revert protectionNot applicable.Pre-commit hook plus a DO NOT REMOVE comment block because two agents already tried to simplify it.

The revert history is in the source file

One odd thing you notice reading scripts/top_performers.py is that the composite-score block is wrapped in a six-line do-not- touch comment. The reason is simple: both the SCORE_SQL expression and the annotate_failure rules have already been reverted by two different agents who decided they looked over-engineered. The current file carries the revert count, the tuning sample size, and a pointer to the pre-commit hook that now blocks further simplification.

scripts/top_performers.py
2x

Reverted by other agents twice already. Protected by pre-commit hook.

scripts/top_performers.py line 58

Try it yourself, with a real posts table

If you already have social-autoposter installed and at least a week of posts in Neon, run the script against your own data. The top-five table by composite score tends to look nothing like the top-five table by raw upvotes.

A real run on a populated posts table

What a feedback loop buys you over a scheduler

What shifts when the tool reads its own output

  • The style distribution drifts toward what actually earns replies, not what you personally like writing.
  • Low-floor platforms (LinkedIn, GitHub) stop starving the feedback loop because their min-score threshold is set lower.
  • A post that mentions a product name or a link gets filtered out of the 'good example' set even if it went viral, so the next draft does not copy the bad habit.
  • The bottom quartile becomes a labeled training set, not a spreadsheet of shame.
  • Style tiers re-rank on every run, so a style that cools off gets demoted without you editing a config file.

If you keep scoring by upvotes, here is what you miss

The simpler filter ORDER BY upvotes DESC correlates well with dopamine and poorly with discussion. Over 3,000 posts the style that won on raw upvotes was not the style that won on comments, and comments are the signal that maps to actual people choosing to engage back. If your reply pipeline tunes on upvotes alone, you drift toward emotionally resonant one-liners and away from the longer, specific posts that drag people into conversation. Cube the wrong metric long enough and the tool quietly becomes a karma farm for you and a notification machine for the people who see it.

Want to see your own posts table graded by SCORE_SQL?

Hop on a call. I'll run top_performers.py against your Neon instance and walk through what the report is telling you to change in the next draft.

Book a call

Frequently asked questions

What exactly is the scoring formula S4L uses?

SCORE_SQL in scripts/top_performers.py (lines 30-35) evaluates to COALESCE(comments_count,0) * 3 + (for Reddit) GREATEST(0, upvotes - 1) or (for every other platform) COALESCE(upvotes,0). Comments are weighted three times higher than upvotes because a reply takes real effort and is the strongest signal that a post earned attention, while an upvote is one click. Reddit gets upvotes minus 1 because the Reddit API returns a post score that includes the OP's own self-upvote, so a comment that drew zero engagement still reads as score=1 unless you subtract it. Views are deliberately excluded from the formula; a comment that goes viral because of an algorithm bump is not a pattern worth imitating.

Why are the per-platform min-score thresholds different?

PLATFORM_MIN_SCORE at lines 39-46 sets Reddit=10, Twitter/X=5, LinkedIn=3, Moltbook=3, GitHub=3. LinkedIn reactions are rarer than Reddit upvotes, so holding LinkedIn to the Reddit bar would mean almost no LinkedIn post ever qualifies as a 'top performer' and the feedback loop starves. Tuning the floor per platform is what lets a LinkedIn comment that pulled 3 reactions and 1 reply still count as a teachable top performer while a Reddit comment at the same raw number gets correctly filtered as noise.

How is this actually different from RecurPost or a RecurPost alternative?

RecurPost and every tool on its listicles schedules content you wrote. The unit is the post in the library, and the product's job is to rotate it onto the calendar. S4L's unit is the reply to somebody else's thread. Before writing each new reply it runs SCORE_SQL over the last month of posts in a Neon Postgres table and feeds the top performers plus the annotated failures to Claude as system context. A scheduler does not need to read its own engagement data because its job is not to decide what to say next; S4L's job is exactly that.

What does the FAILURE REASON annotator actually tag?

annotate_failure() at lines 96-124 applies seven rules to every bottom-quartile post: mentions of product names in PRODUCT_NAMES (fazm, assrt, pieline, cyrano, terminator, mk0r, s4l, vipassana.cool), presence of http:// or https:// or www., phone-call pitch phrases like 'missed call' or 'answering service', macOS-app adjacency phrases like 'accessibility api' or 'mcp server', having three or more question marks, any comment tagged 'curious' with a question mark (the curious_probe style has a negative Reddit average), and raw length under 100 characters. If none match, the reason is stamped as 'likely wrong subreddit or off-topic'. The LLM then reads these labels and the corresponding text as a do-not-copy set.

Why is the function block marked 'DO NOT REMOVE OR SIMPLIFY'?

Lines 54-59 carry a six-line comment block warning agents not to touch the SCORE_SQL logic or the annotate_failure rules, with the note 'reverted by other agents twice already. Protected by pre-commit hook. See CLAUDE.md.' Multiple agents, including earlier Claude sessions, have tried to simplify the composite score back to raw upvotes or rewrite annotate_failure to be less opinionated. The pre-commit hook now blocks the diff when those specific lines change. The warning comment is the in-band explanation for the hook.

Does the feedback loop actually reach the draft, or does it just sit in a log?

It reaches the draft. Each per-platform shell wrapper (skill/run-reddit-search.sh, skill/run-twitter-cycle.sh, skill/run-linkedin.sh) calls top_performers.py and interpolates the result into the prompt that launches a child `claude -p --strict-mcp-config --mcp-config <platform>-agent-mcp.json` process. The child sees the feedback report, the style tier report, and the llms.txt of the target product in its initial context. The Playwright MCP browser does not even start until after that prompt is assembled.

Can I use this with a tool from a RecurPost alternatives list like Buffer, Later, or Sprout Social?

Only as two separate layers. Those tools still solve scheduling: queue an evergreen post, recycle it on a cadence. S4L lives one layer above: for conversational replies, where 'what to say' is the hard part, not 'when to post'. The Postgres posts table and the feedback report do not need Buffer or RecurPost to run; they just need a script that posts and a script that scans engagement 12 hours later and writes it back. The two categories do not compete so much as occupy different rungs of the automation stack.