s4l.ai / guide

S4L does not make visuals. It reads the visual in the thread before it replies.

If you searched for "S4L visuals" expecting an image generator, here is the honest answer: S4L is a social autoposter, and the only visuals it touches are the ones already in the post it is about to answer. Before it drafts a single reply it grabs the image, video, GIF, or link-card of that tweet so the model reacts to what the post actually shows, instead of replying blind to a picture it cannot see.

Matthew Diakonov, Written with AI

Published June 16, 20268 min read

Direct answer (verified 2026-06-16)

S4L does not generate visuals. It reads them. Before drafting any reply, the Twitter cycle runs scripts/capture_thread_media.py, which captures the image / video / GIF / link-card of each target tweet, stores it as a {url, alt, type} array, and injects a MEDIA CONTEXT block into the drafting prompt so the model answers what the post visually shows.

Verifiable in the open-source repo: github.com/m13v/social-autoposter.

The problem: replying text-blind to a picture

A huge share of the tweets worth replying to are not really text. They are a screenshot of a chart, a demo video, a meme, or a link-card to a launch. If the drafting model only receives the tweet's caption, it has to guess at what the post is about, and the reply gives the game away: a generic "love this, so true" under an image the model never saw reads like a bot, because it was written like one.

S4L closes that gap deterministically. The prep step of the posting cycle forbids the model from opening the browser itself. Instead the shell pre-fetches the media of every candidate in one cheap pass, and the model only ever sees a tidy description it can react to. Here is the difference in the prompt the writer receives.

Same candidate, with and without the visual

# Phase 2b-prep prompt (no media capture)
Candidate 7211:
  text: "wild what happens when you actually look at the numbers"

# The tweet is a screenshot of a chart.
# The model cannot see it. It guesses:
#   "totally agree, numbers don't lie!"
# Generic. Obviously did not look at the image.

-37% fewer lines

What actually happens, step by step

This is the Phase 2b-prep path in skill/run-twitter-cycle.sh (the thread-media feature, shipped 2026-06-03). Every arrow is a real handoff, not a metaphor.

From candidate URLs to a prompt the model can see

Shell builds a TSV

One candidate_id<TAB>tweet_url line per tweet the cycle is about to draft against, written to a temp file ($MEDIA_URLS_FILE).

capture_thread_media.py runs once

scrape_many_thread_media(urls, scroll_count=1) visits every URL in a single browser pass. The Playwright import is lazy so a no-op run pays nothing.

Empty results get an access check

If a tweet returns no media, diagnose_tweet_access() confirms the page truly rendered. A block page or logged-out shell is treated as unreliable, not as 'no media'.

Media persists to the DB

A {url, alt, type} array is written to twitter_candidates.thread_media via the set_media action, so the record outlives the model run.

A MEDIA CONTEXT block is printed

_build_block() emits one section per candidate that has media or is a repost, with a header telling the model to react to what the tweet VISUALLY shows.

The shell injects $MEDIA_BLOCK

The block is pasted straight into the drafting prompt. If anything above failed, $MEDIA_BLOCK is empty and the cycle drafts against text as usual.

The detail nobody else has: empty array is not NULL

The captured media is a JSON array of {url, alt, type} objects, where type is one of image, video, gif, or card. The subtle part is how the column treats absence. The script is careful never to confuse "I looked and there was nothing" with "I never looked."

The rules guarding the thread_media column

[] (empty array) = captured successfully, this thread had no media
NULL = never captured, a later cycle is free to retry
If x.com served an empty app shell, the script leaves NULL, not []
diagnose_tweet_access() must report a visible page before [] is trusted
Access checks are capped at 3 per cycle so a dead session never stalls the run
Missing alt-text renders as [no description], the model infers from context

That one rule, persist [] only when the page genuinely rendered, is what keeps the column honest. Without it, a single flaky page load would stamp "no media" onto a tweet that actually had a chart, and the model would go on replying blind to it forever.

One script, two outputs

The capture script writes to two places at once: it persists the structured media to the database, and it emits a human-readable block for the prompt. The database copy is for durability and audit; the prompt block is for the model that is about to reply.

candidate URLs -> capture_thread_media.py -> DB + prompt

What the model actually reads

This is the block _build_block() emits, verbatim shape. A candidate that is not listed had no media (or capture was skipped), so the model replies to its text as usual. Notice the repost handling: the model is told the content belongs to the original author, not whoever reposted it.

## MEDIA IN THESE THREADS
Some candidate threads contain images, videos, GIFs, link-cards, or are
reposts. This is part of the content you are replying to: react to what
the tweet VISUALLY shows, not just its text, and treat reposted content
as the original author's. A candidate NOT listed here had no media and is
not a repost (or capture was skipped); reply to its text as usual.
Descriptions marked [no description] mean the media had no alt-text, so
infer from the thread text and the media type.

Candidate 7211:
  - image: "dashboard showing 3 line charts trending up" (pbs.twimg.com/media/<id>)

Candidate 7218:
  - video: [no description] (video.twimg.com/<id>)

Candidate 7224:
  - REPOST: this is a repost surfaced by @some_account. The tweet text
    and any media below were written by the ORIGINAL author, not the
    reposter. Reply to the original author's content; do not address
    the reposter.
  - card: "Show HN: a tiny local-first vector store" (github.com/<user>/<repo>)

Generated at runtime by scripts/capture_thread_media.py during Phase 2b-prep.

Why a self-hosted tool bothers with this

Most autoposter writeups stop at "find a thread, draft a comment, post it." They treat a tweet as a string. But the threads most worth engaging are exactly the ones carrying a chart, a screenshot, or a demo clip, and a reply that ignores the picture is the fastest way to look automated. Capturing the visual first is cheap insurance against the single most common tell.

It is also why the capture is deterministic shell work rather than something the model is trusted to do on its own. The prep prompt forbids the writer from opening the browser, so there is exactly one media fetch per cycle, the result is logged, and the same media is available for audit long after the reply went out. The model never decides whether to look; it always already has.

Want S4L engaging in your buyer communities?

Book a call and we will walk through how the engagement engine reads threads, picks a voice, and posts from your handle.

S4L visuals: common questions

Does S4L generate images or visuals?

No. S4L does not run an image model and does not attach generated artwork to posts. The 'visuals' it cares about flow the other direction: it reads the media that is already in the thread it is about to reply to. Before drafting a reply, scripts/capture_thread_media.py captures the image, video, GIF, or link-card of each candidate tweet so the reply-writer can react to what the post actually shows instead of replying text-blind.

Where in the code does S4L capture thread visuals?

scripts/capture_thread_media.py. The Twitter posting cycle (skill/run-twitter-cycle.sh, Phase 2b-prep, the 2026-06-03 thread-media feature) builds a TSV of candidate_id<TAB>tweet_url, runs the capture script in one browser pass via scrape_many_thread_media(urls), persists the result into the twitter_candidates.thread_media column, and prints a MEDIA CONTEXT block to stdout that the shell injects into the drafting prompt as $MEDIA_BLOCK.

What shape is the captured media stored in?

A JSON array of objects, each {url, alt, type}, where type is one of image, video, gif, or card. The alt field is the platform alt-text when present; when it is missing the prompt renders it as [no description] and tells the model to infer from the surrounding text and the media type. The array lives in the twitter_candidates.thread_media column so the record survives independent of the model run.

Why is an empty media array different from NULL?

An empty array [] means 'we captured this thread successfully and it had no media'. NULL means 'we never captured this thread'. The distinction matters because if x.com serves an empty app shell, a block page, or a logged-out view, the tweet did not actually render, so the script refuses to persist [] (which would falsely claim 'no media') and leaves the column NULL instead, letting a later cycle retry rather than poisoning the row.

How does S4L know the page really rendered before trusting an empty result?

When capture returns no media, the script runs diagnose_tweet_access() on the URL. Only statuses like visible or visible_no_anchor are trusted as a real empty result. Statuses such as app_not_hydrated, app_error, or logged_out mark the capture unreliable and the column stays NULL. The access check is capped per cycle by SAPS_TWITTER_EMPTY_MEDIA_ACCESS_CHECKS (default 3) with a wait of SAPS_TWITTER_EMPTY_MEDIA_ACCESS_WAIT_MS (default 4000ms), so a broken session does not stall the whole run.

Does the visual capture ever block or slow down posting?

No. The whole step is best-effort. The browser import is lazy, so a short-circuit run never pays the Playwright cost, and if scraping throws, the script emits an empty block and exits 0. In the shell, any failure simply leaves $MEDIA_BLOCK empty and the cycle proceeds to draft against text as usual. Reading the visuals is an enhancement, never a gate.

What about reposts and quote content?

Repost provenance is detected at discovery time (the timeline is the only place X renders the '<account> reposted' banner) and stored on the candidate row. capture_thread_media.py reads that stored flag and, for a repost, emits a REPOST note in the MEDIA CONTEXT block telling the model the text and media belong to the original author, not the reposter, and to reply to the original content.

Is this the same as the bandit described on /t/s4l?

No, they are different layers. The page at /t/s4l explains how S4L picks which engagement style to write in, scored live from its posts table. This page is about the input the writer sees: the visuals of the thread being answered. Style selection decides the voice; media capture decides whether the model is even looking at the right thing.