Social media marketing automation tools that read their own report card before every draft
Every listicle for "social media marketing automation tools" ranks the same dozen schedulers by feature checkboxes. The interesting automation is not on the calendar. It is in the loop that runs a SQL query over your own past posts, strips out the wins that happened to mention your own product, attaches failure reasons to the losers, and pastes the whole annotated scoreboard into the next claude -p prompt before the model drafts the next reply.
The SERP describes broadcasting tools. This is a learning tool.
Search "social media marketing automation tools" and the first page is Sprout Social, Hootsuite, Buffer, Sendible, SocialBee, Eclincher, Sprinklr, Gumloop, Jotform, taap.bio, Appy Pie. All different authors, all the same category: a calendar grid plus an AI caption button plus a mentions inbox. Those tools answer "when should I post what". They do not answer "what should the next reply learn from the last fifty replies". The second question is what this page is about.
The anchor fact: a composite score you can read in 9 lines
Everything on this page points to one SQL expression, and that expression lives near the top of scripts/top_performers.py. Three design choices are baked into it: the 3x weighting on comments, the -1 correction on Reddit, and the explicit exclusion of views. Each one is a call about what "learning from your own history" should actually mean.
From the comment on line 26
"Views deliberately excluded. Viral-by-algorithm is not a pattern worth imitating."
That one line is the design decision the scheduler tools do not make. Views are the signal their dashboards emphasize, because views are what their reporting features are built around. Views tell you the algorithm liked a post. They do not tell you a person did. This is why the composite ignores them.
What the composite actually counts
Comments times three, upvotes plain (minus one on Reddit), and nothing else. Each choice has a reason on top of it in the source.
comments * 3
Real discussion is the strongest imitation signal. A post that sparked 4 replies teaches Claude more than one with 40 passive upvotes. That is why comments are weighted 3x.
upvotes, minus 1 on Reddit
The Reddit OP's self-upvote inflates score returned by the API. GREATEST(0, upvotes - 1) compensates so the rank is not polluted by a post's own author.
views deliberately excluded
Viral-by-algorithm is not a pattern worth copying. A tweet that gets 400k views from one reshare teaches the next draft nothing about what lands. Views do not enter SCORE_SQL.
per-platform floor, not one threshold
PLATFORM_MIN_SCORE = reddit: 10, twitter: 5, linkedin: 3, moltbook: 3, github: 3. Reactions have different scales per platform, so the floor is per-platform.
fallback to general on empty project
If the filter (project and platform) returns zero rows, the helper returns None so the caller can fall back to general posts. The report never comes back blank.
Per-platform floors, not a single threshold
Reactions do not scale the same on every platform. 10 upvotes on Reddit is a working post; 10 reactions on LinkedIn is an excellent post; 10 likes on Twitter is a normal Tuesday. So the score threshold that qualifies a post as a "top" is a per-platform number, not one shared constant.
Reddit min score
0
comments*3 + upvotes - 1
Twitter / X min score
0
comments*3 + likes
LinkedIn min score
0
reactions rarer
GitHub min score
0
engagement sparse
Why winners cannot contain your own product names
A post that happened to mention "fazm" and went well is a trap. The model will learn "mentioning fazm is a winning move," when in fact the win came from everything around the brand mention. So before the top posts reach the LLM, every one of them is run through has_anti_pattern(). If the post contains any of 9 product names or any URL shape, it is dropped from the report. The function fetches limit * 3 rows so there is headroom to lose a few to the filter and still have a clean top-15.
What the filter actually blocks
- A top post containing the word 'fazm' is filtered before the LLM sees it
- A top post containing 'assrt', 'cyrano', 'terminator', 'pieline', 'mk0r', 's4l', 'vipassana.cool', or 'vipassana-cool' is filtered too
- A top post containing http://, https://, or www. is filtered
- The query fetches limit * 3 rows so enough clean winners remain after stripping
- Rationale is on line 63 of the file: 'Product names that indicate self-promotion (teaching Claude bad habits)'
- The pre-commit hook protects this block: lines 55-58 say 'DO NOT REMOVE OR SIMPLIFY. These have been reverted by other agents twice already.'
Every loser shows up with a reason attached
The bottom 10 posts are not just listed with a score of zero. They arrive with a "FAILURE" suffix generated by annotate_failure(row), a rule engine with 8 branches. It looks for product names, URLs, phone-capture pitches, mcp-adjacent pitches, excessive question marks, curious_probe-style wording, and short-but-not-punchy content. Everything unmatched falls through to a catch-all: "likely wrong subreddit or off-topic".
A real report, as the LLM sees it
When you run python3 scripts/top_performers.py --platform reddit the output is three blocks of markdown: style performance, top 15, bottom 10 with reasons. The shell wrapper (engage.sh line 75) captures stdout into a variable and pastes it into the Claude prompt at line 150. Shape below is representative, numbers redacted.
Where the report plugs into the prompt
One line captures the report. One line pastes it into the heredoc that becomes the model's input. That is the whole wiring. The Claude subprocess sees the scoreboard before it reads the actual pending reply.
Four inputs fan in, three blocks fan out
The report is assembled from four inputs: the posts table, the per-platform floor, the product-name blocklist, and the failure-reason rules. Three blocks come out, and all three land in the prompt.
top_performers.py, one cycle
The whole cycle, top to bottom
Seven steps. Nothing else happens in between. The last step writes new rows that feed step 2 of the next run, which is where the loop actually closes.
launchd fires engage.sh on a 6h cadence
com.m13v.social-engage.plist invokes skill/engage.sh. This is how the whole cycle begins. There is no human pressing a button and no 'schedule' in the Buffer sense.
engage.sh shells out to top_performers.py
Line 75 of engage.sh: TOP_REPORT=$(python3 $REPO_DIR/scripts/top_performers.py --platform reddit 2>/dev/null || echo '(top performers report unavailable)'). The query runs over the Neon Postgres posts table.
top_performers.py assembles 4 blocks
Style performance (avg_cm desc, avg_up desc per engagement_style), project/platform summary, top 15 posts by composite SCORE_SQL (with has_anti_pattern stripping), and bottom 10 posts with annotate_failure(row) reasons. Returns as stdout markdown.
anti-pattern stripping runs before cut-off
get_top_posts fetches limit * 3 rows, then clean = [r for r in rows if not has_anti_pattern(r[5])]. A winning post containing 'fazm' or 'https://' never reaches the LLM, so Claude cannot learn to imitate its own product-mentioning wins.
annotate_failure attaches a reason per loser
For every bottom post, the helper builds a ' | '-joined list from 8 rules: mentions product name, contains URL/link, phone-capture pitch, mcp-adjacent pitch, 3+ question marks, curious_probe style on Reddit, under 100 chars, or fallback to 'likely wrong subreddit or off-topic'.
engage.sh injects TOP_REPORT into the heredoc
Line 150 of engage.sh, inside the prompt heredoc: ## FEEDBACK FROM PAST PERFORMANCE (use this to write better replies): $TOP_REPORT. The full annotated report becomes a first-class block of the drafting prompt.
claude -p drafts the next reply while reading the report
The spawned Claude subprocess sees its own scoreboard inline. It reads which styles drove comments, which posts hit, which posts failed and why, then drafts. The output goes to the browser profile for posting, and the new post row feeds the next run's report.
S4L vs the scheduler listicle, row by row
A calendar-first tool and a feedback-loop tool are not feature-for-feature competitors. They solve different halves of "social media marketing automation." This table is what changes when you trade the first for the second.
| Feature | Generic SaaS scheduler | S4L |
|---|---|---|
| How the tool improves over time | Team reads a dashboard on Monday and adjusts the content calendar by hand | top_performers.py runs before every engage.sh cycle and rewrites the prompt the model reads next |
| Ranking signal | Impressions, reach, engagement rate (platform APIs, whatever the dashboard shows) | Composite SCORE_SQL = comments * 3 + (upvotes - 1 on reddit). Views are excluded on purpose |
| What the model sees before drafting | A tone preset ('professional', 'casual') and a content library the operator filled in | An annotated report: style performance, top 15 wins with their exact text, bottom 10 losses with attached failure reasons |
| How self-promotion bias is handled | Product / brand guidelines in a doc the operator wrote once | has_anti_pattern() filters any top post containing one of 9 product names or a URL before the LLM sees it |
| Failure post-mortem | Quarterly review meeting, maybe a shared Notion | annotate_failure(row) runs on every bottom post every single cycle and attaches a specific reason string |
| Per-platform trust threshold | Shared across all platforms, usually at the settings level | PLATFORM_MIN_SCORE = reddit: 10, twitter: 5, linkedin: 3, github: 3, moltbook: 3 (per-platform) |
| Protection against regression | Version history on the content calendar | Lines 55-58 of top_performers.py: 'DO NOT REMOVE OR SIMPLIFY. Reverted by other agents twice already. Protected by pre-commit hook.' |
| Where the loop closes | In a human's head, between Monday and the next content meeting | In Postgres. Every new post row is visible to the very next cycle's top_performers.py query |
Want the feedback loop wired into your own posts table?
30 minutes. I walk through the score formula, the anti-pattern filter, and how to inject the report into your own prompt.
Book a call →Frequently asked questions
Frequently asked questions
What makes S4L different from Buffer, Hootsuite, Sprout Social, SocialBee, or Sendible in 2026?
Those tools are calendar schedulers. The product is a grid of time slots you drag content into, plus an optional AI caption generator that uses a shared tone preset. S4L has no calendar. The core primitive is a Postgres posts table and a Python script (scripts/top_performers.py) that re-ranks its own past output against a composite SCORE_SQL on every engage.sh cycle. The output of that script becomes a verbatim block of the next drafting prompt. In other words, the automation the listicles describe is about scheduling; the automation here is about learning.
Why is the score 'comments * 3 + upvotes', not 'views + likes + engagement rate'?
Because comments are the only signal that proves a person stopped scrolling and wrote something in response. Impressions and views measure algorithmic distribution, not what the content taught anyone. The comment at line 26 of top_performers.py is explicit: 'Comments are the strongest imitation signal (real discussion), upvotes are second, views deliberately excluded (viral-by-algorithm != a pattern worth imitating).' The 3x weighting is there because comments are rarer than upvotes on every platform tested.
Why does Reddit get a -1 on upvotes?
Because the OP's own self-upvote inflates the score returned by the Reddit API. Without compensation, every post starts at 1 upvote instead of 0, which pollutes the ranking when a post has no genuine engagement. GREATEST(0, COALESCE(upvotes,0) - 1) on lines 32-33 removes that 1-vote floor on Reddit only. Twitter and LinkedIn do not have the same self-upvote problem, so their score is just COALESCE(upvotes, 0).
Why strip product names from the top posts?
Because a winning post that happened to mention 'fazm' or 'cyrano' would teach the next draft that mentioning those names is a winning pattern. It is not; those wins come from the surrounding context, not the brand mention. PRODUCT_NAMES on lines 62-65 lists the 9 tokens currently filtered: fazm, assrt, pieline, cyrano, terminator, mk0r, s4l, vipassana.cool, vipassana-cool. has_anti_pattern() on lines 83-93 runs a lower() substring check for each, plus a URL check for http://, https://, and www. The function returns True for any hit, and top posts that return True are dropped from the report.
How does the 'top 15 after filtering' still return 15?
get_top_posts on lines 205-247 fetches limit * 3 rows from Postgres, then runs the anti-pattern filter, then slices to limit. With limit=15 that is 45 fetched to get roughly 15 clean. If fewer than 15 survive (a very self-promotional account), the function returns whatever is left. The point is that the LLM never sees the self-promoting winners, even if that means the report is shorter.
What are the 8 failure reasons annotate_failure() attaches?
Lines 96-124: 'mentions <product name>', 'contains URL/link', 'product-adjacent pitch (phone/call capture)', 'product-adjacent (mentions own project)' for mcp-server/desktop-agent/accessibility-api matches, 'too many questions (reads as interrogation)' when content.count('?') >= 3, 'curious_probe style (negative avg on Reddit)' when 'curious' appears with a question mark, 'too short without being punchy' when len(content) < 100, and a fallback 'likely wrong subreddit or off-topic' when nothing else matched. Each bottom post gets its reasons joined with ' | '.
Why are there per-platform floors instead of one threshold?
Because the scale of reactions differs by platform. On Reddit, a post with 10 upvotes and a few comments is noteworthy. On LinkedIn, a post with 3 reactions and 1 comment is also noteworthy. PLATFORM_MIN_SCORE on lines 39-46 sets reddit at 10, twitter and x at 5, linkedin at 3, moltbook at 3, github at 3. A default of 5 covers anything unknown. If no platform is passed, min_score_for(None) returns 5.
Where exactly does the feedback report get injected into the prompt?
skill/engage.sh line 75 captures TOP_REPORT=$(python3 $REPO_DIR/scripts/top_performers.py --platform reddit 2>/dev/null || echo '(top performers report unavailable)'). Line 150 of the same file, inside the big heredoc that becomes the Claude prompt, literally includes '## FEEDBACK FROM PAST PERFORMANCE (use this to write better replies): $TOP_REPORT'. The whole report lands as a top-level block of the drafting context, ahead of the engagement styles block.
Does this actually change what the LLM writes?
It changes two things. First, the style ranking table shifts which archetypes Claude reaches for first. Storyteller with avg_cm=1.7 on Reddit will be picked more often than snarky_oneliner with avg_cm=0.4 because the prompt now shows that gap. Second, the bottom-post reasons teach Claude not to repeat specific failure patterns; after a few cycles where 'curious_probe style (negative avg on Reddit)' appears in bottom-10, the model stops drafting curious_probe on Reddit at all, which is why PLATFORM_POLICY also bans it at the hard level.
Is this separate from the PLATFORM_POLICY hard bans?
Yes. PLATFORM_POLICY in scripts/engagement_styles.py is tone policy, not data policy. It bans curious_probe on Reddit and snarky_oneliner on LinkedIn regardless of what the numbers say. top_performers.py is performance data, not policy. The two layers stack. A style has to clear both: not banned by PLATFORM_POLICY, and surfaced (or demoted) by the live score ranking. This separation is deliberate: policy is about what we want to be seen doing, performance is about what works.
Why is this block under a do-not-simplify warning?
Lines 55-58 of top_performers.py: 'DO NOT REMOVE OR SIMPLIFY THE FUNCTIONS BELOW. These are data-driven improvements based on analysis of 3,000+ posts. They have been reverted by other agents twice already. Protected by pre-commit hook. See CLAUDE.md.' The block got refactored out by AI coding agents who saw 'unused PRODUCT_NAMES' in a static check and removed it. The warning plus the pre-commit hook exist so the block survives future refactors.
Can I use this pattern without running all of S4L?
The pattern is small. A SQL query with a composite score, a per-platform threshold dict, a substring-based anti-pattern filter, and a rule-based failure annotator. If you already log your own posts to a database with an upvotes or reactions field and you run your drafts through an LLM, you can copy this shape verbatim, change the column names, and inject the output into your prompt. The only S4L-specific piece is which engagement styles are tracked; you would pick your own.
How the feedback loop fits into the rest of the engagement pipeline.
Related S4L guides
Social media marketing automation where tone is a live bandit
The tone selector is a live multi-armed bandit over avg_upvotes. MIN_SAMPLE_SIZE=5. curious_probe banned on Reddit, snarky_oneliner banned on LinkedIn.
Best social media automation tools: one Claude session per reply
Every reply gets its own fresh claude -p subprocess with a 300s deadline. No batched 200-reply prompts, no shared context, no tonal drift.
Social media auto posting that waits 5 minutes before deciding
T0/T1 snapshots via fxtwitter, 300s sleep, delta_score formula. Only candidates with climbing engagement make it to the reply queue.