Release Notes: Chrome Extension 0.1.84

Version 0.1.84 is the current public release line for the Chrome extension, runtime, storage, bridge, and overlay work.

It brings the browser-hosted agent path into one usable package: a user can authorize a tab, add their own OpenAI API key, upload actions.json.storage, start a gpt-realtime-2 voice session, and let the agent use the current site’s declared actions.

What Changed

  • The Chrome extension can host a gpt-realtime-2 voice/text agent using the user’s OpenAI API key stored in Chrome extension storage.
  • The hosted agent can use uploaded actions.json.storage through actions.site, so site maps and context can work without a local bridge.
  • The Agent and Settings UI includes voice selection, VAD controls, transcript, memory, storage upload/download, bridge settings, and session status.
  • The live voice session is owned by an extension offscreen document so closing or reinjecting the visible overlay does not intentionally stop the session.
  • runtime.session.log returns transcript, tool catalog, tool-call, storage, screenshot, navigation, and lifecycle diagnostics under the current primitive result envelope.
  • HTTPS pages can use an insecure local or Tailscale ws: bridge URL through the extension background service worker, avoiding page-level mixed-content WebSocket failures for the extension path.
  • The runtime exposes claimed-tab tools: browser.claimed_tabs.list and browser.claimed_tabs.activate.
  • Report overlays render in an isolated sandbox frame so agent-authored HTML and CSS are less likely to be corrupted by page styles.
  • overlay.open and overlay.register_launcher support reusable storage-backed templates with separate JSON data files.
  • Template and data files can come from different storage scopes, such as a public or shared template with private user data.
  • Downloading a template/data overlay creates a standalone HTML bundle that can render outside the extension.
  • Uploading a standalone overlay bundle imports the files into private storage, then reopens the overlay from private references.
  • The bookmarklet/embed runtime shares portable overlay and primitive behavior where page policy allows it.

Why It Matters

actions.json is now more than a schema draft. It has a working browser path where a user can try an agent on a real website, give it a reviewed action map, let it inspect the page, and receive visual artifacts back as overlays.

The important product split is:

  • actions.json is the readable map of what a website can do.
  • actions.json.storage is the user’s file workspace for site maps, context, observations, overlays, and shared memory.
  • The Chrome extension is the most capable browser runtime for authoring and hosted-agent use.
  • The bookmarklet/embed runtime is the portable page-JavaScript design path.
  • The bridge is the optional adapter for external coding agents.

What To Test

  1. Install the Chrome extension release.
  2. Open a website and choose Take control of this tab.
  3. Open the actions.json menu.
  4. In Settings, save an OpenAI API key.
  5. Optional but recommended: upload an actions.json.storage checkout.
  6. Choose a voice and VAD settings.
  7. Start the voice session from Agent.
  8. Ask what actions are available on the current site.
  9. Ask the agent to take a screenshot and summarize what it sees.
  10. Ask the agent to navigate or scroll using the available site actions.
  11. Ask the agent to create an overlay.
  12. Download the overlay. If it was template/data-backed, confirm the downloaded file opens as a standalone bundle.
  13. Upload that overlay bundle and confirm it reopens from private storage.
  14. Authorize a second tab and ask the agent to switch between authorized tabs.
  15. If anything fails, ask a coding agent to inspect runtime.session.log.

Boundaries

  • The hosted agent uses the user’s OpenAI key; the project does not provide a shared hosted key.
  • Microphone permission is controlled by Chrome.
  • The extension operates only on user-authorized tabs.
  • Debugger-backed operations are for authoring and repair. Durable site maps should prefer reviewed actions.json actions and portable primitives.
  • Bookmarklet behavior can be limited by the current page’s policy. A production first-party embed can ask the site owner for integration permissions that a bookmarklet does not have.
  • The project is still pre-1.0; schema, storage, primitive, and bridge details may change.