s4l.ai / guide / social media marketing automation

Social media marketing automation where the tone is a live bandit, not a template.

Q: How is this different from the social media marketing automation tools on the top Google results?

Buffer, Hootsuite, Sprout Social, Sendible, HubSpot, and Gumloop all automate scheduling, caption drafting, hashtag suggestions, and autoresponders. None of them automate the choice of comment tone based on live performance data. S4L does exactly that in scripts/engagement_styles.py. Before every draft, a SELECT against the posts table returns avg_upvotes per engagement_style for the target platform, the trusted styles are split into thirds, and the tiered block is injected into the Claude prompt. The automation primitive is the tone, not the schedule.

Q: What are the 7 tones and why exactly 7?

critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner. Each tone has a description, a one-line example, a best_in map per platform, and a safety note. 7 is the number of discriminable voices we actually ship different prompt templates for. More would collapse into each other when measured against real upvote distributions; fewer would leave platforms without a voice that suits them (pattern_recognizer for dev Twitter, snarky_oneliner for large subs, storyteller for career content).

Q: Why is MIN_SAMPLE_SIZE set to 5?

Five is the threshold at which avg_upvotes stops being noise on our data shape. Below 5, one viral comment moves the average enough to promote a tone into the 'dominant' tier by accident, which then starves other tones. At 5 and above, a new viral hit moves the average but not enough to sole-occupy the top tier. Any style with fewer than 5 active samples is pushed into the 'explore' bucket so the model keeps drafting it without the orchestrator pretending to have a verdict yet.

Q: What does 'active' status mean in the bandit query?

The filter is status='active' AND engagement_style IS NOT NULL AND our_content IS NOT NULL AND LENGTH(our_content) >= 30 AND upvotes IS NOT NULL. Active excludes deleted, removed, and inactive rows. The length floor removes test drafts and 'lol' comments, which would otherwise distort the average for tones that favor short replies. Removed or deleted comments are signal-negative, but they are rare enough that pruning them from the aggregation is cleaner than trying to score them.

Q: Can S4L replace Buffer or Hootsuite?

Only if tone automation is what you need. S4L does not have a scheduling calendar, approvals, brand asset management, or a unified inbox, and it is not trying to. It is the tone layer: which of 7 voices should the model use next to maximize the kind of engagement that actually lifts upvotes on this platform. If you want a calendar + inbox, use Buffer or Hootsuite. If you want the tone to self-tune from live data, S4L is the layer that does that.

Every guide on the first page of Google covers the same primitive: schedule a post, draft a caption, suggest hashtags, auto-reply to DMs. S4L automates the thing none of them touch. Before every comment, it runs a Postgres query against the posts table, ranks 7 named tones by live average upvotes on that specific platform, and splits the trusted tones into thirds so the best one fires about 60% of the time and the worst about 10%.

See S4L in action Jump to the bandit code

The tone is the automation.

7 voices, re-ranked from live upvotes

SELECT avg_upvotes GROUP BY engagement_style

trusted if n >= 5 samples

top third fires ~60% of the time

bottom third fires ~10%

platform policy overrides performance

0:00 / 0:05

Matthew Diakonov, Written with AI

Published April 19, 202611 min read

4.9from 47

7 named comment tones, re-tiered every draft

MIN_SAMPLE_SIZE = 5 before any tone is 'trusted'

PLATFORM_POLICY hard-bans curious_probe on Reddit, snarky_oneliner on

Pleaser/validator is an explicit anti-style in the prompt

The 7 tones the bandit chooses from

criticstorytellerpattern_recognizercurious_probecontrariandata_point_dropsnarky_onelinerrecommendation (reply-only)

Defined in scripts/engagement_styles.py as the STYLES dict. The drafting prompt is conditioned on the chosen tone, not on a generic "write a comment". Reply pipelines get one extra voice, "recommendation", governed by a tier-independent link strategy capped at 20% of replies.

What every other automation guide misses

Read the top 10 Google results for this keyword and you will get a consistent mental model: automation is about repetition. You have a content calendar, you schedule the posts once, AI helps you write the captions, chatbots cover the DMs, analytics tells you what worked. The automation sits around a human-drafted calendar and stretches it.

The problem that mental model ignores: when you actually engage in comments and threads, the copy you write is the product. Scheduling the right post on the wrong day costs you a slot; writing in the wrong voice costs you the reader. None of the tools on the first SERP page model voice selection as a control loop. They model it as a template library.

S4L treats voice as a bandit arm. Seven named tones, a fresh Postgres read before every draft, a hard minimum sample size before any tone is trusted, and a tone-policy override that can beat performance data. That loop is the angle of this page, and it lives in a single file: scripts/engagement_styles.py.

The anchor fact: get_dynamic_tiers()

This function is the bandit. It reads candidates, filters by platform policy, splits trusted tones into thirds, and hands the three lists (dominant, secondary, rare) to the drafter. Usage guidance is in the prompt itself: dominant ~60%, secondary ~30%, rare ~10%.

scripts/engagement_styles.py

The block that matters is the third-split at the bottom: top third becomes dominant, bottom third becomes rare, middle plus all untrusted tones form secondary. With only 1 or 2 trusted tones the cut degenerates safely so dominant holds everything trusted and rare is empty.

0named engagement tones

0min samples before a tone is trusted

0tiers: dominant, secondary, rare

0platforms each with its own tier table

The live Postgres read that feeds the bandit

Competitors treat performance as a reporting artifact (you read the dashboard, you change the templates). S4L treats it as an input to the next draft. _fetch_style_stats runs this SELECT every time get_dynamic_tiers is called, which means every time a comment or reply is drafted.

scripts/engagement_styles.py

The filter is worth reading line by line. status='active' excludes deleted and removed rows. LENGTH(our_content) >= 30 drops test drafts and low-effort replies that would otherwise skew the average. upvotes IS NOT NULL lets posts too new to have been scored skip the aggregation rather than count as zeros.

How the whole loop hangs together

Four inputs collapse into one re-tiered style block every draft, then fan out to five separate platform drafters. The hub of the diagram is not a service, it is the prompt.

How tone automation wires up

0 overrides

“Platform policy is not a performance judgment. It is a tone and brand constraint. Even if the data showed high upvotes, we still do not want this style.”

scripts/engagement_styles.py, PLATFORM_POLICY comment block

Where policy beats performance

Every arm on the bandit is pre-filtered by PLATFORM_POLICY before any math runs. A tone can be the highest-upvote voice in the dataset and still not appear in the prompt if it violates the platform's tone constraint. This is the part of the loop that is deliberately not data-driven.

scripts/engagement_styles.py

What happens when one comment is drafted

Read PLATFORM_POLICY for this platform

If the platform has a 'never' list (curious_probe on Reddit, snarky_oneliner on or GitHub), those styles are removed from the candidate pool first, before any performance math runs. Tone policy beats performance.

Run a fresh SELECT on the posts table

_fetch_style_stats queries Postgres live, every draft. It filters to status='active', engagement_style IS NOT NULL, our_content length >= 30, and upvotes IS NOT NULL. Then it groups by engagement_style and returns n and avg_up.

Split candidates into trusted and explore

A style is trusted only if its n is at least MIN_SAMPLE_SIZE (5). Any style with fewer than 5 active samples is pushed into the 'explore' list so the model keeps trying it regardless of noisy early numbers.

Sort trusted tones by avg_upvotes, descending

Highest average on this platform goes first. The sort is stable against the same platform's rows only; a tone that wins on Twitter has zero influence on its Reddit tier.

Slice trusted into thirds

If there are enough trusted tones, top third becomes 'dominant' (target usage ~60%), bottom third becomes 'rare' (~10%), middle is 'secondary' (~30%). With only 1 or 2 trusted tones, all of them go to dominant and rare is empty.

Append explore tones to secondary

Untrusted tones join 'secondary' so they continue to get drafted roughly 30% of the time. This is how a new style escapes the cold-start trap without being declared good on insufficient evidence.

Inject the tiered style block into the Claude prompt

get_styles_prompt builds a multi-section markdown block: PRIMARY / SECONDARY / RARE with each tone's description, example, best-in list, and safety note. The drafter sees the tiers every single run; the tiers can flip overnight based on what posted well yesterday.

The 7 tones, plus the anti-style

Each tone has a description, a one-line example, a per-platform best_in list, and a short safety note. The anti-style is not one of the 7; it is the failure mode the prompt tells the model to actively avoid.

critic

Point out what is missing, flawed, or naive. Reframe the problem. NEVER just nitpick; offer a non-obvious insight. Plays in r/Entrepreneur, r/smallbusiness, r/startups; only with softer framing.

storyteller

First-person narrative with concrete details. Lead with failure or surprise, not success. Never pivot to a product pitch. Strong in r/startups, r/Meditation, founder-story Twitter career posts.

pattern_recognizer

Name the pattern or phenomenon. Authority through pattern recognition, not credentials. Best in r/ExperiencedDevs, r/programming, r/webdev and dev Twitter.

curious_probe

One specific follow-up question about the most interesting detail. ONE question only, never a list. Hard-banned on Reddit regardless of upvote performance.

contrarian

Take a clear opposing position backed by experience. Must have credible evidence attached; empty hot takes get destroyed. Works in r/Entrepreneur, r/ExperiencedDevs, industry-debate Twitter.

data_point_drop

Share one specific, believable metric. Let the number do the talking. No links. Numbers must be believable, not impressive. Strong on r/SaaS and growth Twitter.

snarky_oneliner

Short, sharp, emotionally resonant observation. One sentence max. Hard-banned on and GitHub. NEVER on small or serious subs like r/vipassana.

pleaser/validator (anti-style)

'this is great', 'had similar results', '100% agree'. This is NOT one of the 7 styles. The prompt explicitly tells the model to AVOID this voice because it has the lowest average engagement across every platform.

S4L vs generic social media marketing automation

Feature	Generic scheduler + caption AI	S4L
What gets automated	When to post, what caption, which hashtags	Which comment TONE to use next, per platform, per thread
Where the style comes from	A template library you filled in	Live SELECT on posts grouped by engagement_style and platform
How a new tone earns trust	You turn it on in the UI	Minimum 5 samples at status='active' before its avg_upvotes is trusted
How performance changes usage	Manual editing of template weights	Trusted tones are sorted by avg_upvotes and split into thirds: top third fires ~60%, middle ~30%, bottom third ~10%
How brand safety overrides data	Opaque, depends on vendor	PLATFORM_POLICY 'never' list excludes a tone even if it was winning. Curious_probe is banned on Reddit. Snarky_oneliner is banned on and GitHub.
What the drafter sees	A blank editor and a calendar	A tiered style block injected into the Claude prompt with live rankings explained in-line
Content length contract	Platform-native limits only	Drafts where our_content length < 30 chars are excluded from the bandit table entirely
Anti-style	Not modeled	The prompt tells the model to AVOID the pleaser/validator voice, with the justification explicit in text

By the numbers, straight from the file

named tones in the STYLES dict

MIN_SAMPLE_SIZE before a tone's avg_upvotes is trusted

usage target for tones in the top-third dominant tier

minimum chars of our_content before a row enters the bandit

Who this matters to

If you use social media for announcements, use Buffer. A scheduler is enough because the content is already decided. If you use social for engagement, where the reply is the asset and the voice is what wins attention, your bottleneck is voice, not time. That is the automation gap S4L fills.

Indie operators running multiple products feel this first. You cannot manually pick a tone for every comment across 5 platforms and 8 subreddits, and you definitely cannot track which tone is working where by reading a dashboard. A live bandit over 7 named tones with hard platform policies is the minimum version of that automation that does not collapse into "write something about X" and pray.

Frequently asked questions

How is this different from the social media marketing automation tools on the top Google results?

Buffer, Hootsuite, Sprout Social, Sendible, HubSpot, and Gumloop all automate scheduling, caption drafting, hashtag suggestions, and autoresponders. None of them automate the choice of comment tone based on live performance data. S4L does exactly that in scripts/engagement_styles.py. Before every draft, a SELECT against the posts table returns avg_upvotes per engagement_style for the target platform, the trusted styles are split into thirds, and the tiered block is injected into the Claude prompt. The automation primitive is the tone, not the schedule.

What are the 7 tones and why exactly 7?

critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner. Each tone has a description, a one-line example, a best_in map per platform, and a safety note. 7 is the number of discriminable voices we actually ship different prompt templates for. More would collapse into each other when measured against real upvote distributions; fewer would leave platforms without a voice that suits them (pattern_recognizer for dev Twitter, snarky_oneliner for large subs, storyteller for career content).

Why is MIN_SAMPLE_SIZE set to 5?

Five is the threshold at which avg_upvotes stops being noise on our data shape. Below 5, one viral comment moves the average enough to promote a tone into the 'dominant' tier by accident, which then starves other tones. At 5 and above, a new viral hit moves the average but not enough to sole-occupy the top tier. Any style with fewer than 5 active samples is pushed into the 'explore' bucket so the model keeps drafting it without the orchestrator pretending to have a verdict yet.

What does 'active' status mean in the bandit query?

The filter is status='active' AND engagement_style IS NOT NULL AND our_content IS NOT NULL AND LENGTH(our_content) >= 30 AND upvotes IS NOT NULL. Active excludes deleted, removed, and inactive rows. The length floor removes test drafts and 'lol' comments, which would otherwise distort the average for tones that favor short replies. Removed or deleted comments are signal-negative, but they are rare enough that pruning them from the aggregation is cleaner than trying to score them.

Why is curious_probe banned on Reddit when it is allowed elsewhere?

PLATFORM_POLICY is not a performance judgment, it is a tone and brand constraint. Questions as top-level comments read as low-value on Reddit, regardless of how they score in a vacuum. Even if the avg_upvotes table said curious_probe was winning on Reddit, the 'never' list would still exclude it. Same rule, applied to and GitHub, hard-bans snarky_oneliner. This is the one place in the system where performance data is explicitly allowed to lose to tone policy.

What is the anti-style and why is it in the prompt?

The anti-style is the pleaser/validator voice: 'this is great', 'had similar results', '100% agree', 'that's smart'. It is not one of the 7 styles; it is what emerges when no style guide is present. The drafting prompt ends with 'AVOID the pleaser/validator style. It consistently gets the lowest engagement across all platforms.' Naming it is how we stop the model from defaulting to it.

How often does the bandit update its tiers?

Every draft. get_dynamic_tiers calls _fetch_style_stats, which runs a Postgres query live. There is no nightly snapshot, no cache, no 'tier recompute' cron. A post that lands at 9:00 with upvotes flowing in is already shifting the avg_upvotes that get read when the 9:15 comment runs. For a cold start with no data for a platform, every non-never tone falls into 'secondary', which is what the prompt calls 'use these ~30% of the time'.

Does the tier system apply to reply drafts as well as new comments?

Yes, plus one extra style. REPLY_STYLES is VALID_STYLES plus 'recommendation'. The recommendation style is tier-independent; its use is governed by the Tier 1/2/3 link strategy in the surrounding prompt, not by performance data, and it is capped at 20% of replies. The other 7 styles go through the same get_dynamic_tiers pipeline whether they are drafting a new comment or a reply.

What stops a single viral post from permanently parking one tone in the dominant tier?

Two things. First, MIN_SAMPLE_SIZE means the average has to be computed over at least 5 posts, so one viral outlier has limited lift power. Second, the tier cut is structural, not threshold-based: tiers are the top third, middle, and bottom third of trusted tones. Even if a tone is winning, if it lands in a small trusted pool (len <= 2), the cut degenerates so dominant holds both and rare is empty. The code is in engagement_styles.py lines 203 through 211.

Can S4L replace Buffer or Hootsuite?

Only if tone automation is what you need. S4L does not have a scheduling calendar, approvals, brand asset management, or a unified inbox, and it is not trying to. It is the tone layer: which of 7 voices should the model use next to maximize the kind of engagement that actually lifts upvotes on this platform. If you want a calendar + inbox, use Buffer or Hootsuite. If you want the tone to self-tune from live data, S4L is the layer that does that.

Run social media marketing automation where the tone tunes itself

S4L's bandit re-ranks 7 named voices from live Postgres data every time it drafts a comment. No calendar to fill in, no template library to prune. The automation is the voice selection, and the voice selection learns.

See S4L →

Social media marketing automation where the tone is a live bandit, not a template.

What every other automation guide misses

The anchor fact: get_dynamic_tiers()

The live Postgres read that feeds the bandit

How the whole loop hangs together

How tone automation wires up

Where policy beats performance

What happens when one comment is drafted

Read PLATFORM_POLICY for this platform

Run a fresh SELECT on the posts table

Split candidates into trusted and explore

Sort trusted tones by avg_upvotes, descending

Slice trusted into thirds

Append explore tones to secondary

Inject the tiered style block into the Claude prompt

The 7 tones, plus the anti-style

critic

storyteller

pattern_recognizer

curious_probe

contrarian

data_point_drop

snarky_oneliner

pleaser/validator (anti-style)

S4L vs generic social media marketing automation

Who this matters to

Frequently asked questions

Comments (••)

Comments ()