If you’ve spent any time with AI assistants that can browse the web, you’ll have noticed they’re often clumsy. They take a screenshot, try to figure out what’s on the page, click something, wait, screenshot again. It works but it’s slow, error-prone, and eats an enormous amount of tokens just to do something as simple as filling in a form. The underlying problem is that AI agents are reading websites the same way a person with bad eyesight reads a printed page - squinting at the pixels rather than reading the text.
WebMCP is Google and Microsoft’s attempt to fix that at the browser level.
WebMCP is a proposed web standard, incubated through the W3C’s Web Machine Learning community group and jointly authored by engineers at Google and Microsoft. It gives websites a way to expose structured, callable tools directly to AI agents through a new browser API: navigator.modelContext.
Instead of an AI agent navigating your website by simulating mouse clicks and reading screenshots, your website tells the agent what it can do. You define a set of functions - search for a product, book a flight, file a support ticket - with clear names, descriptions, and input schemas. The agent reads that menu, picks the right function, passes in the right arguments, and gets back a clean JSON response. No screenshots, no DOM scraping, no guessing.
Chrome shipped an early preview in Chrome 146 Canary (behind a flag: WebMCP for testing), with a stable release targeted for sometime in 2026.
Current approaches to web automation by AI agents fall into two camps, both unsatisfying.
Visual processing: The agent takes a screenshot of the page and uses a vision model to interpret it - find the search box, click it, type the query, find the button, click it. This works but is painfully inefficient. A single screenshot-based interaction can consume 2,000+ tokens. Chain a few of those together and you’ve burned through most of a context window just to book a restaurant.
DOM manipulation: More technical agents read the raw HTML of a page and try to infer what to interact with. This is more efficient than screenshots but brittle - website HTML is often deeply nested, inconsistently structured, and full of presentational noise. A minor layout change can break the agent’s ability to find the right element.
WebMCP cuts through both by letting the website author define the interface explicitly. The structured tool call the agent makes consumes roughly 20–100 tokens. That’s an order-of-magnitude efficiency gain over the screenshot approach.
The flow has four steps:
Registration - When a page loads, the website registers its available tools using navigator.modelContext. This replaces any previously registered tools, so the agent always sees a current, accurate list of what the current page can do.
Discovery - When a user asks an AI agent to help with something, the browser queries the active site for its registered tools and surfaces them to the agent.
Invocation - The agent picks the right tool and calls it with structured parameters matching the tool’s defined schema.
Response - The function returns JSON-structured data that the agent processes to continue the task.
There are two ways for developers to register tools.
The simpler path requires no JavaScript. You add HTML attributes directly to existing forms:
<form
toolname="searchProducts"
tooldescription="Search the product catalogue by keyword and filter by category"
toolautosubmit="true"
action="/search"
method="GET"
>
<input name="query" type="text" />
<select name="category">...</select>
</form>
The toolname is the function identifier the agent will call. The tooldescription is a natural language explanation the agent uses to decide whether this tool is relevant. toolautosubmit tells the browser whether to submit automatically after the agent populates the fields. That’s it - no backend changes, no JavaScript, just a few extra attributes on markup you probably already have.
For more complex, dynamic interactions you use navigator.modelContext.registerTool() directly in JavaScript:
navigator.modelContext.registerTool({
name: 'bookFlight',
description: 'Search for available flights between two airports on a given date',
inputSchema: {
type: 'object',
properties: {
origin: { type: 'string', description: 'IATA airport code, e.g. LHR' },
destination: { type: 'string', description: 'IATA airport code, e.g. JFK' },
date: { type: 'string', description: 'Departure date in YYYY-MM-DD format' },
passengers: { type: 'number', description: 'Number of passengers' }
},
required: ['origin', 'destination', 'date']
},
execute: async ({ origin, destination, date, passengers = 1 }) => {
const results = await flightSearch({ origin, destination, date, passengers });
return { flights: results };
}
});
The naming is confusing - WebMCP shares a name and conceptual lineage with Anthropic’s Model Context Protocol (MCP) but they’re different things operating at different layers.
| WebMCP | Anthropic MCP | |
|---|---|---|
| Where it runs | Client-side, in the browser | Server-side, as a backend service |
| Protocol | Browser API (navigator.modelContext) | JSON-RPC over HTTP or stdio |
| Who exposes tools | The website itself, via HTML/JS | A dedicated MCP server you operate |
| Deployment | No separate server needed | Requires running an MCP server process |
| Scope | Web pages and browser interactions | Any data source or service |
E-commerce: A shopping agent asks a retailer’s site to search for “waterproof hiking boots, size 10, under £150”. The site’s registered searchProducts tool returns a JSON list of matching items with structured data - price, stock level, image URL, product ID. The agent presents these to the user without ever screenshotting or scraping the product listing page.
Customer support: A user asks an agent to raise a support ticket. The site exposes a createTicket tool with fields for issue type, description, and attachments. The agent populates them directly and submits, auto-populating technical details (browser version, OS) from context.
Travel booking: A travel agent searches flights, filters by price and stops, and initiates checkout - all through structured tool calls rather than simulating clicks through a multi-step booking flow.
Forms and onboarding: Any website with a sign-up, quote request, or contact form can expose it as a declarative tool. An agent helping a user sign up for a service can do so without manually tabbing through fields.
If this becomes a real standard - and the W3C involvement plus co-authorship from both Google and Microsoft suggests it’s being taken seriously - it opens a new design surface for the web.
Right now, websites are designed for human eyes and human hands. WebMCP suggests a future where websites also need to be designed for agents: clearly defined capabilities, consistent schemas, good natural-language descriptions of what each tool does. It’s less of a visual design problem and more of an API design problem applied to the frontend.
The declarative API is deliberately low-friction. If you have a search form, adding toolname and tooldescription attributes is a fifteen-minute change. The imperative API gives you full control for complex flows. There’s no new infrastructure to run, no authentication layer to integrate with (the browser handles the trust boundary), and no separate SDK to pull in.
Browser support is currently limited to Chrome Canary. Firefox and Safari are in W3C working group discussions but haven’t shipped anything. Until there’s cross-browser support, you’d be building for a narrow slice of the agent ecosystem.