Release Notes: Chrome Extension 0.1.84
Version 0.1.84 is the current public release line for the Chrome extension, runtime, storage, bridge, and overlay work.
It brings the browser-hosted agent path into one usable package: a user can authorize a tab, add their own OpenAI API key, upload actions.json.storage, start a gpt-realtime-2 voice session, and let the agent use the current site’s declared actions.
What Changed
- The Chrome extension can host a
gpt-realtime-2voice/text agent using the user’s OpenAI API key stored in Chrome extension storage. - The hosted agent can use uploaded
actions.json.storagethroughactions.site, so site maps and context can work without a local bridge. - The Agent and Settings UI includes voice selection, VAD controls, transcript, memory, storage upload/download, bridge settings, and session status.
- The live voice session is owned by an extension offscreen document so closing or reinjecting the visible overlay does not intentionally stop the session.
runtime.session.logreturns transcript, tool catalog, tool-call, storage, screenshot, navigation, and lifecycle diagnostics under the current primitive result envelope.- HTTPS pages can use an insecure local or Tailscale
ws:bridge URL through the extension background service worker, avoiding page-level mixed-content WebSocket failures for the extension path. - The runtime exposes claimed-tab tools:
browser.claimed_tabs.listandbrowser.claimed_tabs.activate. - Report overlays render in an isolated sandbox frame so agent-authored HTML and CSS are less likely to be corrupted by page styles.
overlay.openandoverlay.register_launchersupport reusable storage-backed templates with separate JSON data files.- Template and data files can come from different storage scopes, such as a public or shared template with private user data.
- Downloading a template/data overlay creates a standalone HTML bundle that can render outside the extension.
- Uploading a standalone overlay bundle imports the files into private storage, then reopens the overlay from private references.
- The bookmarklet/embed runtime shares portable overlay and primitive behavior where page policy allows it.
Why It Matters
actions.json is now more than a schema draft. It has a working browser path where a user can try an agent on a real website, give it a reviewed action map, let it inspect the page, and receive visual artifacts back as overlays.
The important product split is:
actions.jsonis the readable map of what a website can do.actions.json.storageis the user’s file workspace for site maps, context, observations, overlays, and shared memory.- The Chrome extension is the most capable browser runtime for authoring and hosted-agent use.
- The bookmarklet/embed runtime is the portable page-JavaScript design path.
- The bridge is the optional adapter for external coding agents.
What To Test
- Install the Chrome extension release.
- Open a website and choose Take control of this tab.
- Open the
actions.jsonmenu. - In Settings, save an OpenAI API key.
- Optional but recommended: upload an
actions.json.storagecheckout. - Choose a voice and VAD settings.
- Start the voice session from Agent.
- Ask what actions are available on the current site.
- Ask the agent to take a screenshot and summarize what it sees.
- Ask the agent to navigate or scroll using the available site actions.
- Ask the agent to create an overlay.
- Download the overlay. If it was template/data-backed, confirm the downloaded file opens as a standalone bundle.
- Upload that overlay bundle and confirm it reopens from private storage.
- Authorize a second tab and ask the agent to switch between authorized tabs.
- If anything fails, ask a coding agent to inspect
runtime.session.log.
Boundaries
- The hosted agent uses the user’s OpenAI key; the project does not provide a shared hosted key.
- Microphone permission is controlled by Chrome.
- The extension operates only on user-authorized tabs.
- Debugger-backed operations are for authoring and repair. Durable site maps should prefer reviewed
actions.jsonactions and portable primitives. - Bookmarklet behavior can be limited by the current page’s policy. A production first-party embed can ask the site owner for integration permissions that a bookmarklet does not have.
- The project is still pre-1.0; schema, storage, primitive, and bridge details may change.