Social media marketing on Reddit is a measurement problem, not a posting problem.
Every guide on the first page of Google says the same three things: pick the right subs, be genuinely helpful, drop a link only when invited. Fine. The part nobody describes is what happens after you post. How do you know the comment worked? S4L answers that with a scanner that walks your inbox 100 items per page, matches every reply-to-you back to the exact comment you wrote that earned it, and tags each reply with the engagement style you used. That edge is what turns posting on Reddit into marketing.
The numbers behind the loop
These are the constants hard-coded into scripts/scan_reddit_replies.py. They are small numbers, but each one is a deliberate choice and each one protects a different failure mode.
100 per page is the API's hard cap. 10 pages is enough to cover about a week of a high-volume account without ever touching the Reddit inbox retention ceiling. 48 hours is short enough that the dashboard still feels live; older items get written as status='skipped' with reason backfill_old so you can see them without replying to a comment from last month. 50 consecutive already-known items is the dedup brake: on a slow Reddit day, the loop stops paginating fast instead of walking the whole inbox for nothing.
How each inbox item becomes a row
The path a single reply-to-you takes, from Reddit's inbox JSON to a labeled row in your database, is seven steps long. None of them are glamorous. All of them have to be right for the measurement to be honest.
Inbox scan pipeline
Load cookies, not an API key
scan_reddit_replies.py reads reddit-cookies.json from the logged-in profile so old.reddit.com serves the logged-in inbox JSON, not the public-API proxy. Cookies are refreshed by bootstrap_reddit_cookies.py before each run.
Paginate /message/inbox/.json
Up to 10 pages, 100 items each, with a 1.5-second pause between pages. A non-JSON response immediately raises SessionInvalidError so the scan does not silently write garbage.
Resolve the thread
Each inbox item has a context URL like `/r/sub/comments/<thread_id>/.../<reply_id>`. A regex pulls the thread_id and the DB looks up the matching row in the `posts` table. If there is no match the reply is counted as `unmatched` and skipped.
Decide pending vs backfill_old
Items newer than 48 hours become `status='pending'` for the engage step. Older ones get `status='skipped', skip_reason='backfill_old'` so the dashboard still shows them without firing a reply.
Early-exit on dedup streak
After 50 consecutive already-known items the loop stops paginating. That caps the blast radius when Reddit is slow and keeps repeated runs cheap.
Walk /user/<account>/comments.json for the parent edge
In parallel, fetch_own_replies pages up to 20 times (2000 items max) until it crosses a 30-day horizon, building a map of parent_comment_id -> our_reply_id. That catches comments you posted manually in a browser too.
Fire engage_reddit.py with --limit
The scan subprocesses engage_reddit.py against the newly pending rows. Each reply runs in its own isolated Claude session so archetype rotation uses only the last three replies as context, not your whole history.
The sources of truth
The inbox tells you someone replied to you. It does not tell you which of your comments they replied to, or how you opened that comment. Two different Reddit endpoints have to be fused for the picture to close.
Two feeds, one attribution row
What happens inside the script
The inbox fetch is a single authenticated request. The important thing is the content-type check. If Reddit serves you an HTML login page instead of JSON the scanner refuses to write anything and raises SessionInvalidError so the operator knows to re-bootstrap cookies. Silent failure here would poison every downstream metric.
Resolving the parent comment
Reddit's inbox payload does not include the id of the comment the reply is attached to in a friendly field. It ships a context URL and nothing else. Pulling the thread id out of that URL and matching it against the posts table is how the edge gets drawn.
An inbox item whose thread is not in your posts table gets counted as unmatched and skipped. That is almost always a sign the post came from an account you are no longer tracking, and the skip count is itself a signal worth watching.
Why UTM links alone are not enough
The industry-standard way to measure Reddit marketing is to drop a UTM-tagged URL and watch analytics. That works for the subset of replies where a link is warranted. The inbox scan covers the rest. Both together is the honest picture.
| Feature | Link-click only | S4L attribution loop |
|---|---|---|
| What counts as success | Upvotes and link clicks | Parent-to-reply edges in the database, labeled by the engagement style that earned them |
| Where the data lives | Campaign dashboard you trust the vendor to keep | Postgres `replies` table on your own infrastructure, writable by your own scripts |
| Reply-back detection | None, or alerts with no parent context | Inbox scan maps each reply's `context` URL back to the post and the specific comment you made |
| Comments you made outside the tool | Invisible | 30-day history walk picks up manual replies and back-fills them as `status='replied'` |
| Deduplication across runs | Re-scans every time, double-writes, or drops data | Stops paginating after 50 consecutive already-known items; insert is idempotent on comment_id |
| Reply style signal | Not tracked | engagement_style saved per reply (critic, storyteller, contrarian, data_point_drop, ...) |
“The insert is idempotent on comment_id, which means the scan can run every 5 minutes without double-writing.”
scripts/reply_insert.py
The eight engagement styles, stored per reply
Every reply that goes out has an engagement_style tag written alongside it. The draft step chooses a style, the reply prompt includes the last three replies so the new one does not clone them, and the feedback loop lets you see which styles actually earn replies back.
critic
Pushes back on the OP or top comment with a specific counterexample.
storyteller
Opens with a one-line personal anecdote tied to the thread.
pattern_recognizer
Names a pattern across several examples the thread hasn't connected.
curious_probe
Asks one sharp clarifying question instead of asserting.
contrarian
Disagrees with the thread consensus, clearly and briefly.
data_point_drop
Adds a concrete number from first-hand use, no source citation needed.
snarky_oneliner
One dry sentence. Used sparingly.
recommendation
Tier 3 only: thread explicitly asks for a tool, config has a match, link goes in.
What a scan looks like in the log
Here is the output from a single run on a mid-activity account. The counters tell you exactly how the loop spent its budget.
Before the loop vs after
What you can answer after the loop is wired up
- Which specific comment of mine earned a reply-back this week?
- Which engagement style had the best reply-back rate in the last 30 days?
- How many manual browser replies are already baked into my database?
- Did a scan miss anything because cookies went stale?
- Are my tier 1 (no link) replies earning more conversation than tier 3 (link)?
- Are any subreddits generating replies on threads I never posted in? (unmatched_thread counter)
A note on running this yourself
S4L is an end-to-end system. The inbox scanner is one script in a larger pipeline that also handles discovery, drafting, posting, and the CDP browser that submits replies. You can study the file at scripts/scan_reddit_replies.py and reimplement the loop in whatever language you prefer. What matters is the shape: paginated inbox fetch with a content-type gate, thread-id regex, posts-table lookup, idempotent insert, per-reply engagement_style column. If a tool you are evaluating does not have those five pieces, it cannot tell you whether your Reddit marketing is working.
The endpoints this loop touches
Want the attribution loop on your accounts?
30 minutes with the team that wrote scan_reddit_replies.py. We'll look at your posting cadence and show you what the loop would report after a week.
Book a call →Frequently asked questions
Is Reddit marketing allowed? The sidebar rules in most subs say no promotion.
Reddit's sitewide policy allows participation from people with a commercial interest so long as the comment is substantively useful on its own and the account is not mostly promotional. S4L's default behavior is tier 1, no link. A project link only goes in when the thread explicitly asks for one (tier 3). The engagement loop records which tier was used per reply, so the ratio is auditable.
What is an engagement style and why is it stored?
An engagement style is the rhetorical shape of a reply: critic, storyteller, pattern_recognizer, curious_probe, contrarian, data_point_drop, snarky_oneliner, or recommendation. engage_reddit.py writes the style to the `replies` row so you can graph reply-backs and upvotes per style and rotate away from the ones that underperform. Most Reddit marketing tooling treats every reply the same after posting. S4L treats each one as a labeled data point.
Why old.reddit.com and not the Reddit API?
old.reddit.com returns JSON for logged-in endpoints when the session cookie is sent, which means the same pipeline works for reading the inbox, fetching a thread, and (via a separate CDP browser) posting a reply. The OAuth API requires app-registered tokens, which bind any reply to your app's identifier rather than the account's natural use. Cookies plus CDP behave the way a human user behaves; that is the point.
What happens if the inbox response comes back as HTML instead of JSON?
fetch_inbox checks the Content-Type header. If it is not application/json the function raises SessionInvalidError immediately. That is the signal that reddit-cookies.json has expired and bootstrap_reddit_cookies.py needs to re-login before the next scan. The scan does not write placeholder rows in this case.
Does the scan re-ingest the same reply every 5 minutes?
No. The insert function is idempotent on comment_id. On each page the scanner counts how many inserts produced zero new rows; after 50 consecutive known items it stops paginating. That keeps the DB write volume proportional to new activity, not to scan frequency.
How is this different from UTM tagging my reddit.com links and reading GA?
UTM tagging only captures the clicks on the link you dropped in tier 3 replies. It tells you nothing about tier 1 (no link) replies, which are the majority, and nothing about replies that earned a reply-back without a click. The inbox scan is the only way to see conversation attribution. Both are useful. One without the other is incomplete.
What is the 'backfill_old' status for?
When you first wire up the scanner the inbox is already full of old items you do not want to reply to. Every item older than 48 hours is written with status='skipped', skip_reason='backfill_old'. They appear in the dashboard for historical context but never trigger engage_reddit.py. After the first run, new items are always within 48 hours, so the cutoff becomes a no-op.