feat(browser_execute): auto-attach Page.captureScreenshot results as image attachments#47
Merged
Alezander9 merged 1 commit intomainfrom May 9, 2026
Merged
Conversation
…image attachments Every successful Page.captureScreenshot made during a browser_execute call is now collected from the CDP transport and surfaced as a FilePart on the tool result. The opencode runner appends those attachments to the next assistant turn as image parts, so the model sees the screenshot natively as vision input. No more decode-write-read dance from inside the snippet. Same channel that read.ts and webfetch.ts already use when they surface images; we're adding browser_execute as a third producer. Mechanism (Level 1, zero upstream diff): - cdp/session.ts: new generic onCallResult(fn) listener API, symmetric with existing onEvent. Fires after every successful _call resolve. Keeps the Session agnostic of any one method's semantics. - browser-execute.ts (Level 1): subscribes for the duration of each execute() call, filters to Page.captureScreenshot, accumulates results into a per-call collector returned alongside output/result. When BCODE_SCREENSHOT_DIR is set, the same tap also writes each screenshot to disk (eval-judge consumption — second consumer of the same hook). - tool/browser-execute.ts (Level 2): maps the collector to attachments[] on the ExecuteResult. BROWSER.md and interaction-skills/screenshots.md updated to tell the agent the auto-attach behavior. Two new smoke tests (gated on BCODE_SMOKE_CHROME) verify screenshots round-trip + the env-var disk dump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every successful
Page.captureScreenshotmade during abrowser_executecall is now auto-attached as an image part on the next assistant turn. The model sees the screenshot natively as vision input — nodecode → write → readdance from inside the snippet, no helper bound into snippet scope, no prompt change required.Closes the screenshot-handling gap reported by a downstream agent: previously the base64 PNG came back in the
outputtext and either got truncated (forcing a workaround through thereadtool on a manually-saved file) or arrived as raw bytes the model couldn't interpret.Mechanism
Pure Level-1 addition under
packages/bcode-browser/. Zero upstream diff.cdp/session.ts— newonCallResult(fn)listener API, symmetric with the existingonEvent. Fires after every successful_callresolve. Keeps theSessionagnostic of any one method's semantics;Page.captureScreenshotis a consumer, not baked in.browser-execute.ts(Level 1) —execute()subscribes for the duration of each call, filters toPage.captureScreenshot, accumulates results into a per-callscreenshotscollector returned alongsideoutput/result. WhenBCODE_SCREENSHOT_DIRenv var is set, the same tap also writes each screenshot to disk (best-effort, fire-and-forget) so eval harnesses can collect them for an LLM judge — second consumer of the same hook.tool/browser-execute.ts(Level 2) — maps the collector into the existingattachments[]field onExecuteResult. This is the same channelread.tsandwebfetch.tsalready use when they surface images. We're addingbrowser_executeas a third producer.Surface area
cdp/session.ts: +28 lines (listener API + fire on_callresolve)browser-execute.tsLevel 1: +61 lines (collector, env-var dump, threading; mostly mime/format helpers)tool/browser-execute.tsLevel 2: +15 lines (attachments mapping + screenshot-count footer in tool output)skills/BROWSER.md: 4 lines amended (one screenshot example block)skills/interaction-skills/screenshots.md: 6 lines added (new "Auto-attached" callout)Tests
Two new smoke tests in
test/browser-execute.test.ts(gated onBCODE_SMOKE_CHROME=1, same as existing tests):Page.captureScreenshot is collected into result.screenshots— verifies png + jpeg round-trip with correct mime tags.BCODE_SCREENSHOT_DIR dumps screenshots to disk— verifies the env-var disk-dump path lands.pngfiles.Local run against headless Chrome 147 (Linux x64): 8/8 pass (4 pre-existing Chrome smokes + 2 new screenshot smokes + 2 unit tests).
bun typecheckclean across all packages.Concurrency
Parallel
execute()calls against the sameSession(rare — would require two in-flight tool calls under onesessionID, which opencode serializes within one assistant message) would each subscribe and each see all screenshots produced during their lifetime. Documented as acceptable for v1.Notes for the eval consumer
When
BCODE_SCREENSHOT_DIR=<path>is set, everyPage.captureScreenshotwrites<sessionID>-<startedAt>-<seq>.<ext>to the directory. Disk-dump fires unconditionally on success — independent of theattachments[]path — so it survives any future change to opencode's attachment handling.