Social media marketing on Reddit is a measurement problem, not a posting problem.

Every guide on the first page of Google says the same three things: pick the right subs, be genuinely helpful, drop a link only when invited. Fine. The part nobody describes is what happens after you post. How do you know the comment worked? S4L answers that with a scanner that walks your inbox 100 items per page, matches every reply-to-you back to the exact comment you wrote that earned it, and tags each reply with the engagement style you used. That edge is what turns posting on Reddit into marketing.

Matthew Diakonov, Written with AI

Published April 20, 202610 min read

4.8from early users

Inbox scan every 5 minutes, end to end

30-day comment history reconstructed per account

Eight engagement styles tracked per reply

The Reddit attribution loop

What the inbox scanner actually does

1. Pull 100 inbox items per page, up to 10 pages

2. Parse the context URL for the thread id

3. Look up the parent comment in your posts table

4. Tag the row with the engagement_style that earned it

5. Hand pending rows to engage_reddit.py

0:00 / 0:05

The numbers behind the loop

These are the constants hard-coded into scripts/scan_reddit_replies.py. They are small numbers, but each one is a deliberate choice and each one protects a different failure mode.

0Inbox items fetched per page

0Max pages of inbox paginated

0hHour backfill cutoff

0Consecutive-known items before early exit

100 per page is the API's hard cap. 10 pages is enough to cover about a week of a high-volume account without ever touching the Reddit inbox retention ceiling. 48 hours is short enough that the dashboard still feels live; older items get written as status='skipped' with reason backfill_old so you can see them without replying to a comment from last month. 50 consecutive already-known items is the dedup brake: on a slow Reddit day, the loop stops paginating fast instead of walking the whole inbox for nothing.

How each inbox item becomes a row

The path a single reply-to-you takes, from Reddit's inbox JSON to a labeled row in your database, is seven steps long. None of them are glamorous. All of them have to be right for the measurement to be honest.

Inbox scan pipeline

Load cookies, not an API key

scan_reddit_replies.py reads reddit-cookies.json from the logged-in profile so old.reddit.com serves the logged-in inbox JSON, not the public-API proxy. Cookies are refreshed by bootstrap_reddit_cookies.py before each run.

Paginate /message/inbox/.json

Up to 10 pages, 100 items each, with a 1.5-second pause between pages. A non-JSON response immediately raises SessionInvalidError so the scan does not silently write garbage.

Resolve the thread

Each inbox item has a context URL like `/r/sub/comments/<thread_id>/.../<reply_id>`. A regex pulls the thread_id and the DB looks up the matching row in the `posts` table. If there is no match the reply is counted as `unmatched` and skipped.

Decide pending vs backfill_old

Items newer than 48 hours become `status='pending'` for the engage step. Older ones get `status='skipped', skip_reason='backfill_old'` so the dashboard still shows them without firing a reply.

Early-exit on dedup streak

After 50 consecutive already-known items the loop stops paginating. That caps the blast radius when Reddit is slow and keeps repeated runs cheap.

Walk /user/<account>/comments.json for the parent edge

In parallel, fetch_own_replies pages up to 20 times (2000 items max) until it crosses a 30-day horizon, building a map of parent_comment_id -> our_reply_id. That catches comments you posted manually in a browser too.

Fire engage_reddit.py with --limit

The scan subprocesses engage_reddit.py against the newly pending rows. Each reply runs in its own isolated Claude session so archetype rotation uses only the last three replies as context, not your whole history.

The sources of truth

The inbox tells you someone replied to you. It does not tell you which of your comments they replied to, or how you opened that comment. Two different Reddit endpoints have to be fused for the picture to close.

Two feeds, one attribution row

What happens inside the script

The inbox fetch is a single authenticated request. The important thing is the content-type check. If Reddit serves you an HTML login page instead of JSON the scanner refuses to write anything and raises SessionInvalidError so the operator knows to re-bootstrap cookies. Silent failure here would poison every downstream metric.

scripts/scan_reddit_replies.py (fetch_inbox)

Resolving the parent comment

Reddit's inbox payload does not include the id of the comment the reply is attached to in a friendly field. It ships a context URL and nothing else. Pulling the thread id out of that URL and matching it against the posts table is how the edge gets drawn.

scripts/scan_reddit_replies.py (_post_id_for_context)

An inbox item whose thread is not in your posts table gets counted as unmatched and skipped. That is almost always a sign the post came from an account you are no longer tracking, and the skip count is itself a signal worth watching.

Why UTM links alone are not enough

The industry-standard way to measure Reddit marketing is to drop a UTM-tagged URL and watch analytics. That works for the subset of replies where a link is warranted. The inbox scan covers the rest. Both together is the honest picture.

Feature	Link-click only	S4L attribution loop
What counts as success	Upvotes and link clicks	Parent-to-reply edges in the database, labeled by the engagement style that earned them
Where the data lives	Campaign dashboard you trust the vendor to keep	Postgres `replies` table on your own infrastructure, writable by your own scripts
Reply-back detection	None, or alerts with no parent context	Inbox scan maps each reply's `context` URL back to the post and the specific comment you made
Comments you made outside the tool	Invisible	30-day history walk picks up manual replies and back-fills them as `status='replied'`
Deduplication across runs	Re-scans every time, double-writes, or drops data	Stops paginating after 50 consecutive already-known items; insert is idempotent on comment_id
Reply style signal	Not tracked	engagement_style saved per reply (critic, storyteller, contrarian, data_point_drop, ...)

5 min

“The insert is idempotent on comment_id, which means the scan can run every 5 minutes without double-writing.”

scripts/reply_insert.py

The eight engagement styles, stored per reply

Every reply that goes out has an engagement_style tag written alongside it. The draft step chooses a style, the reply prompt includes the last three replies so the new one does not clone them, and the feedback loop lets you see which styles actually earn replies back.

critic

Pushes back on the OP or top comment with a specific counterexample.

storyteller

Opens with a one-line personal anecdote tied to the thread.

pattern_recognizer

Names a pattern across several examples the thread hasn't connected.

curious_probe

Asks one sharp clarifying question instead of asserting.

contrarian

Disagrees with the thread consensus, clearly and briefly.

data_point_drop

Adds a concrete number from first-hand use, no source citation needed.

snarky_oneliner

One dry sentence. Used sparingly.

recommendation

Tier 3 only: thread explicitly asks for a tool, config has a match, link goes in.

0Engagement styles rotated

0Recent replies fed as style-avoid context

0dDay lookback on own-comment history

0Max own-comment items walked

What a scan looks like in the log

Here is the output from a single run on a mid-activity account. The counters tell you exactly how the loop spent its budget.

scan_reddit_replies.py

Before the loop vs after

What you can answer after the loop is wired up

Which specific comment of mine earned a reply-back this week?
Which engagement style had the best reply-back rate in the last 30 days?
How many manual browser replies are already baked into my database?
Did a scan miss anything because cookies went stale?
Are my tier 1 (no link) replies earning more conversation than tier 3 (link)?
Are any subreddits generating replies on threads I never posted in? (unmatched_thread counter)

A note on running this yourself

S4L is an end-to-end system. The inbox scanner is one script in a larger pipeline that also handles discovery, drafting, posting, and the CDP browser that submits replies. You can study the file at scripts/scan_reddit_replies.py and reimplement the loop in whatever language you prefer. What matters is the shape: paginated inbox fetch with a content-type gate, thread-id regex, posts-table lookup, idempotent insert, per-reply engagement_style column. If a tool you are evaluating does not have those five pieces, it cannot tell you whether your Reddit marketing is working.

Pause between pages

HTTP timeout seconds

Max jitter before a scheduled scan

The endpoints this loop touches

/message/inbox/.json/user/<account>/comments.json/r/<sub>/comments/<thread_id>/.jsonPOST /api/comment (via CDP)SELECT id FROM posts WHERE thread_url LIKE ...INSERT INTO replies (..., engagement_style)

Want the attribution loop on your accounts?

30 minutes with the team that wrote scan_reddit_replies.py. We'll look at your posting cadence and show you what the loop would report after a week.

Frequently asked questions

Is Reddit marketing allowed? The sidebar rules in most subs say no promotion.

Reddit's sitewide policy allows participation from people with a commercial interest so long as the comment is substantively useful on its own and the account is not mostly promotional. S4L's default behavior is tier 1, no link. A project link only goes in when the thread explicitly asks for one (tier 3). The engagement loop records which tier was used per reply, so the ratio is auditable.

What is an engagement style and why is it stored?

An engagement style is the rhetorical shape of a reply: critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner, or recommendation. engage_reddit.py writes the style to the `replies` row so you can graph reply-backs and upvotes per style and rotate away from the ones that underperform. Most Reddit marketing tooling treats every reply the same after posting. S4L treats each one as a labeled data point.

Why old.reddit.com and not the Reddit API?

old.reddit.com returns JSON for logged-in endpoints when the session cookie is sent, which means the same pipeline works for reading the inbox, fetching a thread, and (via a separate CDP browser) posting a reply. The OAuth API requires app-registered tokens, which bind any reply to your app's identifier rather than the account's natural use. Cookies plus CDP behave the way a human user behaves; that is the point.

What happens if the inbox response comes back as HTML instead of JSON?

fetch_inbox checks the Content-Type header. If it is not application/json the function raises SessionInvalidError immediately. That is the signal that reddit-cookies.json has expired and bootstrap_reddit_cookies.py needs to re-login before the next scan. The scan does not write placeholder rows in this case.

Does the scan re-ingest the same reply every 5 minutes?

No. The insert function is idempotent on comment_id. On each page the scanner counts how many inserts produced zero new rows; after 50 consecutive known items it stops paginating. That keeps the DB write volume proportional to new activity, not to scan frequency.

How is this different from UTM tagging my reddit.com links and reading GA?

UTM tagging only captures the clicks on the link you dropped in tier 3 replies. It tells you nothing about tier 1 (no link) replies, which are the majority, and nothing about replies that earned a reply-back without a click. The inbox scan is the only way to see conversation attribution. Both are useful. One without the other is incomplete.

What is the 'backfill_old' status for?

When you first wire up the scanner the inbox is already full of old items you do not want to reply to. Every item older than 48 hours is written with status='skipped', skip_reason='backfill_old'. They appear in the dashboard for historical context but never trigger engage_reddit.py. After the first run, new items are always within 48 hours, so the cutoff becomes a no-op.