s4l.ai / guide / marketing automation and social media

Marketing automation for social media that scores threads before you reply, then measures the lift after.

The default story for marketing automation and social media is outbound. Queue 30 posts in a scheduler, fan them out across four channels, look at the dashboard on Friday. S4L runs the opposite loop. It treats the inbound feed as the work queue, scores every thread with a six-term virality formula, and persists a paired t0/t1 snapshot per row so you can read the delta your reply caused. Same product surface, inverted direction.

M
Matthew Diakonov
11 min read
4.8from from real production runs
Real candidate-scoring code from the repo
Verbatim SQL for t0 / t1 / delta_score
6-hour half-life, derived from production tuning

Same scoring loop, run per platform

Twitter / XRedditLinkedInHacker NewsMoltbookGitHubIndie HackersLobsters

The inversion in one sentence

A scheduler turns a calendar slot into a post. S4L turns an inbound thread into a scored row, and only the top-ranked rows ever become posts. Everything in this page sits on top of that inversion.

Outbound scheduler vs inbound scoring

You decide what to say first. The tool decides when. The tool's job is calendar plumbing and channel fan-out.

  • Calendar slot is the trigger
  • Same content blasted across channels
  • Measurement = aggregate weekly numbers
  • No per-thread context

How the loop is wired

Five inbound feeds funnel into one score function. The function decides which row is worth the next reply, which then becomes a posted comment and, hours later, a measured delta.

Scoring is the chokepoint

Twitter scrape
Reddit scan
LinkedIn pulls
Octolens mentions
GitHub issues
calculate_virality_score()
Top candidate row
Style choice
Posted reply
T1 delta later

The actual formula

This is not an abstract description; it is the function that runs against every scraped tweet on the way into the candidates table. Six terms, each with an explicit cap or decay, all multiplicative.

scripts/score_twitter_candidates.py

Six signals, six different shapes

Each term is calibrated for a different failure mode. Velocity catches the breakout, the bonuses prevent over-rewarding raw size, and the decay deletes anything stale before it pollutes the queue.

Engagement velocity

Total engagements divided by age in hours. The strongest single predictor in the model. A tweet at 200 likes after 30 minutes outranks 600 likes after 8 hours.

Reply bonus, capped at 4x

min(replies / 15, 4.0). 15 replies = +1x, 30 = +2x, 60+ = +4x. Active discussions surface your reply higher.

Discussion ratio, capped at 1x

min((replies / likes) * 10, 1.0). A 0.1 reply-to-like ratio means real argument, not one-way broadcast.

Author reach multiplier

5K to 50K followers gets 1.0x, 50K to 200K gets 1.4x. Mega accounts cap at 1.1x because the comment competition gets brutal above 500K.

Retweet ratio bonus, up to 2x

1.0 + min(rt_ratio * 2, 1.0). When the audience is resharing, the post is still spreading. Your reply rides that distribution.

6-hour half-life decay

exp(-0.1155 * age_hours). 3h keeps 71% of the score, 6h keeps 50%, 12h falls to 25%, 18h to 12.5%. Anything older than 18h is dropped from the queue entirely.

Anchor fact

The age decay is math.exp(-0.1155 * age_hours), a six-hour half-life chosen by production tuning, not by hand.

The previous version used a 3-hour half-life, which deleted slow-burn threads before their second wave. The current value keeps a tweet at 71% after 3 hours, 50% after 6, 25% after 12, and 12.5% after 18. The hard cutoff at 18 hours is a separate filter: if age_hours > 18: skipped += 1; continue. That single line is why the candidates table never accumulates dead rows.

The decay table, in numbers

Every score the function returns is rounded to two decimals. These four numbers are what the decay term contributes at each age threshold; multiply them by the rest of the formula and you have the actual queue ranking.

0%score retained at 3h
0%score retained at 6h
0%score retained at 12h
0hhard drop threshold

The 18-hour cutoff is enforced before the score is even computed. Anything older never enters the table, so a slow-burn thread that spikes at hour 17 still has one chance to be the top row.

How the system knows your reply did anything

The candidates table carries paired snapshot columns. likes_t0 is written at discovery and never touched again. likes_t1 is written by a separate sweep that runs hours later. delta_score is the difference. None of this is a derived analytics view; it is a row in the same table the scoring function writes to.

phase 1 + phase 2 of the candidate lifecycle

One candidate, end to end

Trace one tweet from the moment a scraper sees it to the moment its delta_score lands. Seven stops, none of them in a calendar.

1

Scrape returns raw thread JSON

scan_twitter_mentions_browser.py and find_threads.py emit candidate JSON from a logged-in CDP session. No scoring yet, just raw fields.

2

Score and dedup

score_twitter_candidates.py runs calculate_virality_score on each candidate, drops anything > 18h old, and skips URLs already present in the posts table.

3

Persist with t0 snapshot

Surviving rows are upserted into twitter_candidates with status='pending'. likes_t0 through bookmarks_t0 are written in the same insert and never updated again.

4

Pick a target by virality_score

pick_thread_target.py orders by virality_score DESC, applies project-fit filters, and emits one row per cycle. The agent picks up that row.

5

Engagement style is selected

engagement_styles.py chooses one of 7 styles based on platform + matched_project + topic. The chosen style is the prompt's spine.

6

Reply is composed and posted

twitter_browser.py drafts in the open chrome session, posts, and writes the thread URL into posts so the dedup set is updated for future scoring runs.

7

Hours later, a T1 sweep

A scheduled re-poll updates likes_t1 through bookmarks_t1 on the same row, computes delta_score, and the candidate is finally marked 'measured'.

What the score gates

Drafting and posting are the expensive steps: tokens, browser time, and a finite reputation budget per platform. The score sits in front of all of them as a cheap pre-filter that runs on JSON the scraper already returned.

Score-first reply pipeline

1

Inbound scrape

Logged-in CDP returns thread JSON for the configured queries.

2

Dedup against posts

Any thread_url already in posts is dropped before scoring.

3

calculate_virality_score()

Six multiplicative terms; sub-threshold rows are insert-only with no agent action.

4

pick_thread_target.py

Top-ranked row matches a project, then becomes the next reply target.

5

Reply via browser MCP

twitter_browser.py drafts in the live session, posts, and writes thread_url into posts.

6

Re-poll for t1, compute delta

Same row gets likes_t1...bookmarks_t1 and delta_score updated. Loop closes.

Versus a normal scheduler

The contrast is not feature-by-feature; it is loop-by-loop. A scheduler optimizes the calendar. S4L optimizes which inbound row gets the next reply.

FeatureOutbound schedulerS4L score-first loop
Where the work startsEmpty calendar slot at 9amInbound thread that crossed a virality threshold in the last 6 hours
What gets scoredPast performance of your own postsOther people's threads, ranked by velocity * reach * decay * bonuses
What the model rejectsPosts outside the brand voice templateTweets older than 18h, threads you already replied to, sub-1000-follower authors with low velocity
Measurement primitiveAggregate weekly impressions and CTRPer-thread t0/t1 paired columns plus a delta_score per row
What 'success' meansPosts went out on scheduledelta_score is positive after your comment landed
Cadence safetyRate limits inside the schedulerShared cooldown file at /tmp/linkedin_cooldown.json that cron checks before any action
DedupManual content calendar reviewSELECT thread_url FROM posts WHERE platform='twitter' is consulted before every insert

Run the same loop on your own feeds

S4L is open source. The score function, the t0/t1 schema, the cooldown file, and the platform-specific browser scripts all live in one repo.

See S4L

Why this is now possible at all

0h

half-life on the decay

Up from 3h. Slow-burn threads now survive long enough to reach the top of the queue if their numbers compound.

0

engagement styles

critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner. Picked after the score, never before.

0

paired t0/t1 columns

likes, retweets, replies, views, bookmarks. Same row, snapshot at discovery, snapshot after the reply settles.

6

Drafting a thoughtful reply costs LLM tokens, browser session time, and a finite reputation budget per platform. The score is a cheap pre-filter that decides whether to spend any of that.

from the scoring function's design notes

Questions worth answering

What does marketing automation for social media usually mean, and what does S4L do differently?

The standard meaning is outbound: you queue posts in a scheduler (Hootsuite, Sprinklr, Zoho, Make) and the tool fires them at chosen times across multiple channels. S4L runs the inverse pipeline. It scrapes inbound threads, scores each one with a virality formula in scripts/score_twitter_candidates.py, and only then does a generation step decide what to reply with. The scheduler picks a time; S4L picks a thread.

What is the actual scoring formula?

score = velocity * reach_mult * age_decay * rt_bonus * (1 + reply_bonus) * (1 + discussion_bonus). Velocity is total engagements divided by age in hours. age_decay = math.exp(-0.1155 * age_hours), which is a 6-hour half-life. reply_bonus caps at 4x at 60+ replies. discussion_bonus caps at 1x at a 0.1 reply-to-like ratio. reach_mult tops out at 1.4x for 50K to 200K follower accounts.

Why a 6-hour half-life specifically?

The earlier version used 3h, which deleted slow-burn threads from the queue before their second wave hit. 6h keeps a tweet at 50% of its peak score after 6 hours, 25% at 12h, and 12.5% at 18h. Threads older than 18h are filtered out entirely with `if age_hours > 18: skipped += 1; continue`. The softer decay lets a 'slow banger' beat a 'fresh dud'.

What are the t0 and t1 columns in twitter_candidates?

Paired snapshots. likes_t0 / retweets_t0 / replies_t0 / views_t0 / bookmarks_t0 are written when the candidate is first discovered. likes_t1 / retweets_t1 / replies_t1 / views_t1 / bookmarks_t1 are written hours later by a second pass that re-fetches the same tweet. delta_score is computed from the difference. This is how the system knows whether a reply lifted the thread or rode a corpse.

What stops the system from spamming a platform after a rate limit?

A shared cooldown file at /tmp/linkedin_cooldown.json. The cron job runs `python3 scripts/linkedin_cooldown.py check` first; exit 1 means in cooldown and the run aborts. Cooldown reasons are stored verbatim ('429 rate limit', 'account restricted'), and the resume_after timestamp is checked against the current UTC time on every read. The file is removed automatically once the timestamp passes.

How does dedup work, and why is it part of the scoring step?

Before any candidate is scored, the upserter pulls every thread_url from posts WHERE platform='twitter' AND thread_url IS NOT NULL into a Python set. Any candidate whose URL is already in that set is dropped. The scoring loop runs against fresh URLs only. This is why the system can re-scan the same query feed every hour without reposting on the same thread.

What happens to a candidate that scored well but never got engaged?

Two things, on different schedules. The age_decay term keeps shrinking the row's effective score until either (a) the agent's batch loop picks it up while it's still ranked highly, or (b) the 18-hour filter drops it. Meanwhile, an expire pass marks rows older than 12 hours as 'expired' and prunes posted/expired rows older than 7 days, so the candidates table never grows unbounded.

Why score before replying instead of just posting and measuring after?

Cost asymmetry. Drafting a thoughtful reply costs LLM tokens, browser session time, and a finite reputation budget on each platform. Spending those on a thread that has no audience is the opportunity-cost equivalent of broadcasting into an empty room. The score is a cheap pre-filter (pure math on JSON the scrape already returned) that decides whether to spend the expensive resources at all.

How does this interact with engagement styles?

Once a candidate clears the score threshold, a separate module (engagement_styles.py) picks one of seven named styles for the reply: critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner. Each style has a 'best_in' map per platform, so the matched_project plus the post topic plus the chosen style determine the prompt. Style is selected after scoring, never before.

Stop scheduling. Start scoring.

Marketing automation and social media stops being calendar plumbing the moment the inbound feed becomes the work queue. S4L is the open implementation of that idea.

Open S4L