Claude Opus 4.7 for Reddit comment automation, in production

What it actually looks like to drive Reddit comment automation with Claude Opus 4.7 on a real account: one Claude session per comment (not batched), a strict-MCP config locked to a single logged-in browser profile, JSON-only output, and the orchestrator (not Claude) posting via CDP. Verified telemetry from 4,592 sessions since the 2026-04-21 cutover.

Matthew Diakonov, Written with AI

Published May 8, 202611 min

Direct answer (verified 2026-05-08)

Spawn one Claude Opus 4.7 session per Reddit comment. Each session gets its own UUID, a strict-MCP config that restricts it to a Reddit-only browser profile, and a prompt built from the parent comment plus a rotating archetype. Force a single JSON object as output. Have the orchestrator (not Claude) post the reply via CDP, then re-read the DOM to verify it landed before writing the row to the database.

Verified production cost on this exact pattern, pulled from the claude_sessions table on 2026-05-08: $4.73 average per Reddit comment across 4,592 sessions run between 2026-04-21 (the day we cut over from Opus 4.6) and 2026-05-08, averaging 46.6 seconds wall-clock per session. Reference implementation: github.com/m13v/social-autoposter (scripts/engage_reddit.py).

0engage_reddit sessions on Opus 4.7

$0avg cost per Reddit comment

0savg session wall-clock

0comments in the last 24 hours

source: claude_sessions where model='claude-opus-4-7' and script='engage_reddit', pulled 2026-05-08

One session per comment, not batched

Almost every guide that ranks today for Reddit automation assumes a single long-running LLM session that writes ten or fifty replies in one go. That is the wrong shape. The session accumulates context, starts pattern-matching its own prior outputs, and the seventh reply visibly echoes the second. On top of that, Anthropic's Usage Policy classifier sometimes refuses one prompt, and once it has refused inside a session every subsequent reply inherits the refusal, each one costing $0.05 to $0.30 of fully wasted spend.

The pattern that actually works in production is: pull one pending row from the replies table, mint a fresh UUID, spawn a new claude -p invocation, hand it the prompt for that single comment, parse the JSON output, post via CDP, and exit the session. Repeat for the next pending row. The session UUID is logged to claude_sessions so we can investigate any specific reply later by tailing its transcript.

scripts/engage_reddit.py spawn

scripts/engage_reddit.py

Three details worth pointing at. First, ANTHROPIC_API_KEY is unset before the spawn so Claude uses the OAuth credential from the desktop app. That credential is paid via the existing Pro subscription, which makes the marginal cost on a per-comment basis materially different from a raw API account. Second, --output-format stream-json lets the orchestrator stream tool calls to stderr in real time, which matters when investigating slow runs. Third, no --model flag. Opus 4.7 is the default model in ~/.claude/settings.json, and the spawn inherits it. To downgrade for a single run we set CLAUDE_MODEL=sonnet in the environment; the wrapper expands it conditionally so the global default still wins everywhere else.

What flows where

Every actor on the diagram below is a separate process. The split between "model decides" and "orchestrator executes" is the load-bearing design choice. If you let the model do the actual CDP-posting, two things go wrong: it sometimes posts the wrong reply when its tool call retries, and you lose the ability to verify the post landed before marking the row complete. With the model returning JSON and the orchestrator owning the post, every row in the replies table either has a verified post or a clean error reason, never a shrug.

One reply, end to end

The prompt is short, deliberately

The prompt fed to each Opus 4.7 session is roughly 2,500 tokens of input and produces about 4,000 tokens of output, including the reasoning Claude does as it fetches the parent thread via Bash. The structure has been ground down to five blocks. The framing line, the parent reply data, the recent-archetypes block (so the model varies away from the last three), the top-performers feedback (so it can self-tune toward what converted last week), and the seven-archetype style menu.

engage_reddit.py / build_prompt() output

The seven archetypes are not abstract advice. They are concrete patterns the rotation forces the model to walk through:

criticstorytellerpattern_recognizercurious_probecontrariandata_point_dropsnarky_oneliner

Every reply landing in the replies table is tagged with the archetype the model said it used. The top-performers report block is then computed from the last 14 days of those tags joined against engagement signals, and fed back into the next prompt. The loop closes inside the prompt itself.

What 4.7 broke that 4.6 used to forgive

Opus 4.7 went GA on April 16, 2026 and we cut the engage_reddit pipeline over to it on April 21. The first 48 hours were rough. Prompts that had been working for months on 4.6 started returning prose preambles around the JSON object instead of just the JSON. The orchestrator's parser expects a literal JSON object, so every preambled response failed downstream and the row went into a retry loop. The fix was not to switch models. It was to accept that Anthropic's release notes for 4.7 were honest: 4.7 follows literal instructions more strictly and stops compensating for vague prompts. The vague prompt was ours.

The contract had to be made literal

# pre-2026-04-16 prompt (Opus 4.6 era)
You are drafting a reply for our reddit account.
Pick the best archetype for this thread and write
1-3 sentences. Return the reply.

# Opus 4.6 was forgiving:
# - implicit JSON expectation worked
# - "best archetype" was inferred
# - vague tone instructions filled in

-100% more lines, fewer parse failures

The right-hand prompt is longer than the left, not shorter. The lesson from the 4.7 cutover was the opposite of what we assumed going in. We thought the smarter model would let us simplify. Instead it punished implicit expectations and rewarded a literal JSON contract spelled out in capital letters at the bottom of the prompt.

Strict-MCP locks each session to one browser profile

A Claude session that has free MCP access can drive any registered browser. That is fine for interactive use and a disaster for autonomous comment automation. The wrong profile on the wrong tab posts the wrong reply on the wrong account. The fix is a per-platform MCP config and the --strict-mcp-config flag.

The reddit-only config lives at ~/.claude/browser-agent-configs/reddit-agent-mcp.json. It declares exactly one MCP server, the reddit-agent backed by a persistent Chromium profile. The orchestrator acquires a file-based lock on /tmp/reddit-agent.lock before spawning Claude, so two engage_reddit ticks landing in the same minute (which happens, the cron is every ten minutes but a previous tick can run long) cannot fight over the same browser profile and corrupt cookies.

Cost shape: cache reads carry the bill

The cost number that surprised us when we first looked at the telemetry: average input tokens per session was 12, but average cache_read tokens was 325,044. That ratio is what Claude Code looks like in practice. The system prompt and the tool definitions get loaded once, written into the 5-minute prompt cache, then referenced by every tool-use turn. Output tokens average 3,923. With Opus pricing at $15 input, $1.50 cache_read, $75 output per million tokens, the cost per comment is dominated by the output and the cached prompt re-reads, not the user prompt itself.

The tunable knob on this is whether the model needs to fetch the full thread context. The prompt instructs it to call python3 ~/social-autoposter/scripts/reddit_tools.py fetch via Bash before deciding, which adds one tool turn and therefore one cache_read pass. We tried prefetching the thread in Python and embedding the JSON in the prompt directly. That cut the average cost from $4.73 to about $3.40, but the variance on output quality went up because the model had to read a wall of comments rather than fetching the slice it actually needed. We reverted. $4.73 with cleaner replies beats $3.40 with worse ones when the downstream signal is account safety, not unit economics.

What changed between "manual reply" and "Opus 4.7 reply"

The same thread, the same account

Operator opens reddit.com/inbox, reads the parent comment, switches to Notion to look up the active campaign suffix, drafts a reply, copies it back, posts, then updates a Google Sheet with what they posted and which campaign was applied.

Operator reads inbox manually
Looks up campaign suffix in Notion
Drafts reply by feel, no archetype rotation
Posts via the actual reddit web UI
Updates a side spreadsheet for tracking

What the model never gets to do

Three categories of action are locked away from the model. The campaign suffix that gets appended to outbound replies is looked up by the orchestrator in Python, not handed to the model. The model never sees the suffix, never knows there is an A/B test running, and cannot mention either thing on Reddit. The actual CDP page.click and page.type calls are made by the orchestrator after the JSON parse succeeds, not by Claude through an MCP tool. And the database write that flips a replies row from processing to replied only happens after a separate CDP-driven re-read of the page confirms the comment text actually rendered.

The pattern is the same one a careful build pipeline uses with an LLM: the model proposes; deterministic code disposes. With Reddit's anti-bot detection in the picture, that split is also a safety layer. A hallucinated tool call in a Claude session does not turn into an unauthorized post on a live account, because the model never had the keys to post.

What you can verify in the open-source repo

Every claim on this page traces to a specific path in the repo at github.com/m13v/social-autoposter:

The per-comment session loop: scripts/engage_reddit.py (the run_claude and build_prompt functions).
The seven engagement archetypes: scripts/engagement_styles.py (the STYLES dict).
The strict-MCP scaffold: scripts/engage_reddit.py ensure_mcp_config().
Cost logging into claude_sessions: scripts/log_claude_session.py (Opus pricing table at line 104).
The launchd glue that fires the loop every ten minutes: skill/engage-reddit.sh.

Want this loop running on your accounts?

Twenty minutes on a call, we walk through your subreddit list, your account state, and what the per-comment session pattern would cost on your volume. No deck, no slides.

Implementation questions, answered with specifics

Why a separate Claude session per Reddit comment instead of one big batch?

Two practical reasons. First, context accumulation: a session that has already drafted 50 replies starts pattern-matching its own prior outputs and produces visibly homogeneous prose. The fresh session per comment forces real variance because Opus 4.7 cannot reuse the rhythm of a reply it never saw. Second, blast radius. If Anthropic's Usage Policy classifier refuses one prompt, every subsequent reply in the same session inherits the refusal and burns about $0.05 to $0.30 per parroted refusal. The orchestrator detects 'Claude Code is unable to respond' on the first hit and aborts the batch so the next launchd cycle picks up the rest after the prompt is reworded.

What does the prompt actually contain?

Five blocks, in order: a framing line that says you are drafting a reply on behalf of the user's account; the JSON of the parent comment plus the matched archetype data; a recent_replies block with the last three archetypes used so the new draft varies away from them; a top_performers report so the model can self-tune toward styles that converted in the last 14 days; and a content_rules / engagement_styles block with the seven concrete archetypes. Output is forced to a single JSON object, either {action: 'skip', reason} or {action: 'reply', text, project, engagement_style, new_style}. The orchestrator (not Claude) posts via CDP after parsing the JSON.

How much does Opus 4.7 cost per Reddit comment in production?

Across 4,592 engage_reddit sessions on claude-opus-4-7 from 2026-04-21 to 2026-05-08, average total cost was $4.73 per comment with average duration 46.6 seconds. The dominant cost driver is cache_read tokens (averaging 325,044 per session) because Claude Code loads the system prompt and tool definitions once and reads them from the 5-minute cache on subsequent turns. Output averages 3,923 tokens. The last 7 days dropped to $3.97 per comment as we tuned the prompt to terminate earlier when the model is going to skip.

Why did existing Opus 4.6 prompts break on 4.7?

Anthropic shipped 4.7 with stricter literal instruction following. Prompts that worked on 4.6 by relying on the model to compensate for vague phrasing produced shorter or visibly worse output on 4.7. The fix in engage_reddit.py was to make the JSON contract explicit ('Your ENTIRE output must be ONLY the JSON object above. No other text, no explanations, no markdown.') and to remove instructions that depended on Opus inferring intent. After the rework the JSON parse-success rate on 4.7 returned to parity with 4.6.

How is the Reddit browser profile isolated from other platforms?

Each platform has its own MCP config under ~/.claude/browser-agent-configs/. The reddit-agent-mcp.json file lists exactly one MCP server (the reddit-agent), and engage_reddit.py invokes Claude with --strict-mcp-config so the spawned session physically cannot drive Twitter, GitHub, or any other browser profile even if the prompt tried to talk it into doing so. There is also a /tmp lock acquired before the session starts, so two engage_reddit ticks cannot fight over the same profile and corrupt cookies.

Does the model decide whether to post, or does the orchestrator?

Both, with a clean split. The model decides reply-or-skip and drafts the reply text, returning a JSON object. The orchestrator does everything that has side effects: append a literal campaign suffix if a campaign matches, post via CDP, verify the comment landed by re-reading the DOM, write the row to the replies table, log token usage to claude_sessions, and bump campaign counters. This split exists because letting the model do the actual posting is what gets accounts banned. CDP from the orchestrator behaves like a logged-in human and is auditable end-to-end.

What happens when Anthropic's safety classifier refuses a prompt?

The orchestrator looks for the literal string 'Claude Code is unable to respond' combined with 'Usage Policy' or 'violate' in the response. On a hit, it logs the refusal cost (typically $0.05 to $0.30), aborts the batch, leaves remaining rows in pending state, and exits cleanly. The next launchd tick picks them up after a human edits the prompt. Without this short-circuit a single bad prompt phrase would fan out into hundreds of identical refusals across one cron window, each costing real money for zero output.

Why not use Opus 4.6 or Sonnet 4.6 to save money?

The repo briefly tried both. Sonnet 4.6 is cheaper per call (about $0.91 average across 49 reddit-pipeline runs we logged) but produces noticeably more generic openers, and the engagement_styles rotation degrades because Sonnet has weaker recall over the seven archetypes. The cost gap closes once you account for the post-hoc human review needed on Sonnet's output. Opus 4.7 is the global default in ~/.claude/settings.json and engage_reddit.py inherits it without specifying --model. To downgrade for one run set CLAUDE_MODEL=sonnet in the environment; we have not flipped the default.