Release Notes: Chrome Extension 0.1.74

Version 0.1.74 is the first release where the Chrome extension is useful as a hosted browser agent, not only as a bridge-connected runtime.

It adds a bring-your-own-OpenAI-key gpt-realtime-2 agent, storage-backed site actions, runtime diagnostics, and a more durable browser session model.

Who Should Try It

Try this release if you want to:

  • run a voice/text agent inside Chrome with your own OpenAI API key;
  • upload an actions.json.storage checkout and let the agent use site-specific maps;
  • test current-site actions through actions.site;
  • inspect screenshots, navigation, tool calls, and transcript logs;
  • author or validate actions.json maps on real websites.

New Capabilities

Hosted Realtime Agent

The extension can host a gpt-realtime-2 session directly. The user supplies an OpenAI API key in the Settings tab. The Agent tab provides voice controls and a conversation transcript.

The session is owned by an extension offscreen document so it can survive page overlay teardown better than a page-owned audio session.

Storage-Backed Site Actions

The extension can upload an actions.json.storage checkout into browser local state. Once uploaded, the hosted agent can use actions.site behind the scenes to list and run actions for the current website.

This means the hosted agent can use previously authored site maps without requiring a local coding-agent bridge.

Stable Tool Surface

The extension exposes a small stable tool catalog instead of one global tool per website. Site-specific capabilities are discovered through actions.site.

Direct primitives include screenshots, scrolling, pointer actions, DOM/section inspection, locator geometry, storage import/listing, overlays, and session logs.

Session Diagnostics

runtime.session.log returns transcript, tool catalog, tool call, storage, navigation, screenshot, warning, and error events. Use it when a session behaves incorrectly or when a tool call needs evidence.

The extension popup can show voice session state and stop a live hosted session. This matters because the Realtime session is owned by an offscreen document and may still be active after a page overlay closes.

The extension stores menu state in extension storage and can recreate the actions.json menu after navigation. Navigation may still interrupt the visible overlay, but the desired user model is that the voice session and stored state continue across page changes.

Changed Behavior

  • The local bridge is no longer required for the hosted extension agent.
  • The extension UI now uses Agent and Settings as top-level tabs.
  • Storage actions use Upload and Download language.
  • browser.run_javascript can be removed from site-facing actions when a site blocks or should not allow page JavaScript evaluation.
  • Debugger-backed JavaScript remains an extension authoring fallback through debug.run_javascript.

Known Limits

  • The extension is still pre-1.0.
  • Microphone permission is controlled by Chrome and may need manual approval.
  • Some sites block bookmarklet transport through CSP or mixed-content policy.
  • Some navigations can still interrupt visible overlays while the extension reinjects them.
  • The hosted agent can read and use uploaded storage-backed maps, but full autonomous editing of storage is still a future capability.

Suggested Smoke Test

  1. Install the extension from the release package.
  2. Open a website and authorize the tab.
  3. Open the actions.json menu.
  4. Save an OpenAI API key in Settings.
  5. Upload an actions.json.storage checkout.
  6. Start a hosted voice session.
  7. Ask what page you are on.
  8. Ask what actions are available for the site.
  9. Ask for a screenshot or a visible-page summary.
  10. Navigate within the site and reopen the menu if needed.
  11. Pull runtime.session.log if anything fails.