Methodology · Agent-Ready Index v2

How we score 5,596 stores against the bar AI shoppers actually use

Eight dimensions. One hundred points. Four bands. One disqualifying gate. Published in full so you can replicate every signal we measure — and so the brands at the top earn it.

Live cohort snapshot

Total scored

5,596

Live audited

5,534

100.0%

Pending audit

62

re-running

Blocking AI

99

robots.txt disallow

Band distribution (live-audited only)

Agent-Ready

80–100

0 stores

0.0%

Largely agent-ready

65–79

21 stores

0.4%

Developing

45–64

111 stores

2.0%

Emerging

0–44

5,402 stores

97.6%

The eight dimensions

What we measure, and why each thing matters to an AI shopping agent

Weights total 100. Each dimension is the weighted sum of multiple sub-signals; the spec is public at scripts/agent-readiness-rubric/spec/rubric.v2.json and is the source of truth for both our offline pipeline (Python) and the live audit service (TypeScript), with parity tests pinning them.

1

Product JSON-LD

18 pts

Live signal

Whether the PDP emits a schema.org Product node with name, sku/gtin, description and an image[] array.

How we measure: We fetch a PDP from your sitemap, parse every <script type="application/ld+json"> block (including @graph), find the Product node and check each required sub-field.

2

Offer clarity

16 pts

Live signal

Whether the Offer attached to the Product has price, priceCurrency, availability and priceValidUntil — and whether any sale window is still valid today.

How we measure: Read the Offer node from the same JSON-LD parse, validate priceValidUntil parses as a date in the future, cross-check price against your public catalog feed when available.

3

Catalog feed quality

16 pts

Whether a structured product feed exists (/products.json on Shopify, /wp-json/wc/store/products on WooCommerce, etc.) and how much agent-relevant metadata it carries: GTIN, google_product_category, brand/vendor, taxonomy depth.

How we measure: GET the platform-canonical feed path; count GTIN coverage across products; check for google_product_category, brand/vendor presence, and ≥2-level categories.

4

Freshness and inventory

14 pts

Whether the catalog and stock signals are current. Stale prices or stale inventory makes agents downweight you.

How we measure: Last-Modified header on PDP, snapshot recency of our cached catalog, inventory_count > 0 ratio, whether the platform has disable_checkout set.

5

Rich PDP schema

12 pts

Live signal

Four schema.org node types beyond Product — AggregateRating (unlocks ratings in answer cards), BreadcrumbList (category resolution), FAQPage (long-tail "does X work with Y" answers), MerchantReturnPolicy (agents weight returns risk into recommendations).

How we measure: Each of the four nodes contributes 3 of the 12 points. Found via the same @graph-aware JSON-LD parse used for dim 1.

6

Image and media

8 pts

Live signal

≥3 product images, absolute https URLs, alt text, and a populated schema.org image[] array.

How we measure: Count <img> tags on the PDP; fraction with absolute https src; fraction with alt= attribute; length of schema image array on Product.

7

Agent commerce surface

9 pts

Live signal

Depth of your UCP profile (`/.well-known/ucp`) — basic presence is now tablestakes (Shopify defaults provide it). What matters: declared capabilities, whether you advertise an MCP service in `ucp.services`, signing keys for RFC 9421 request authentication, merchant-hosted vs platform-hosted, and the presence of an A2A `/.well-known/agent-card.json`.

How we measure: Fetch /.well-known/ucp, parse JSON, count capabilities, scan ucp.services for an mcp-typed endpoint, count signing_keys[], detect merchant-vs-xpay-hosted. Also probe /.well-known/agent-card.json and /.well-known/oauth-protected-resource. Penalised if any deprecated/fictitious well-known URI is served (we maintain a denylist).

8

Agent accessibility

7 pts

Live signal

Whether shopping agents can actually reach you. robots.txt rules per tracked AI user-agent, meta robots consistency, no Cloudflare bot-fight challenge, no X-Robots-Tag: noai, no IPTC noai/noimageai on PDP images.

How we measure: Parse robots.txt per RFC 9309, check per-UA Disallow rules against a 12-agent allowlist (GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, Google-Extended, Applebot-Extended, Amazonbot, …). Probe meta tags + response headers.

The disqualifying gate

Blocking shopping agents caps the score

If a store’s robots.txt serves Disallow: / against any of GPTBot, OAI-SearchBot, PerplexityBot or ClaudeBot, the score is capped at 65 (top of largely_ready) and dimension 8 (Agent accessibility) is forced to zero — regardless of how good the rest of the rubric scores.

Rationale: a store that publicly opts out of agentic discovery shouldn’t appear on a leaderboard claiming to predict which brands AI shoppers will find. They’ve chosen to be invisible. 99 of 5,596 stores in this cohort currently hit the gate.

What this rubric won’t do

Honest limits, explicit gaps

UCP presence alone is not agent-ready

Every default Shopify store passes the basic UCP probe. The rubric reflects this — UCP only contributes meaningfully via depth signals (capabilities count, MCP advertisement, signing keys). Stores running default Shopify with no other agent-readable surface land in `emerging`, not `agent_ready`.

Pending-audit stores carry an offline-only baseline

Live audit coverage is currently 5,534 of 5,596 (100.0%). The remaining 62 carry an offline-only score derived from cached catalog signals; we mark them "Live audit pending" and refresh within 24h.

Headless-rendered schema is detected, but treated cautiously

When static HTML returns no JSON-LD, we re-probe via a headless render to catch JS-injected schema. Static-served schema scores higher than JS-injected — crawler reliability differs, and we publish the distinction so merchants moving to SSR see the bump.

LLM narrative is rubric-coupled, not free-form

The per-store narrative published in the index is grounded in the rubric output; it doesn't make claims the dimension scores can't support. Hand-eval calibration is the prerequisite for the LLM enrichment refresh.

Reproducibility

Build the score yourself

Spec is one JSON file

scripts/agent-readiness-rubric/spec/rubric.v2.json holds every weight, threshold, band cutoff, agent allowlist entry and gate trigger. The Python and TypeScript scorers both load it; parity tests pin the outputs.

Live audit endpoint is public

audit.xpay.sh/api/v2/audit?url=<your-store> returns the same RubricResult shape every score on this site is derived from, plus the per-check evidence we extracted.

Hand-eval calibration is open

Spearman ρ between human scoring and the v2 rubric is the next checkpoint. Distribution will tune as the rubric earns confidence.

Methodology · Agent-Ready Index v2

How we score 5,596 stores against the bar AI shoppers actually use

Eight dimensions. One hundred points. Four bands. One disqualifying gate. Published in full so you can replicate every signal we measure — and so the brands at the top earn it.

Live cohort snapshot

Total scored

5,596

Live audited

5,534

100.0%

Pending audit

62

re-running

Blocking AI

99

robots.txt disallow

Band distribution (live-audited only)

Agent-Ready

80–100

0 stores

0.0%

Largely agent-ready

65–79

21 stores

0.4%

Developing

45–64

111 stores

2.0%

Emerging

0–44

5,402 stores

97.6%

The eight dimensions

What we measure, and why each thing matters to an AI shopping agent

1

Product JSON-LD

18 pts

Live signal

Whether the PDP emits a schema.org Product node with name, sku/gtin, description and an image[] array.

How we measure: We fetch a PDP from your sitemap, parse every <script type="application/ld+json"> block (including @graph), find the Product node and check each required sub-field.

2

Offer clarity

16 pts

Live signal

Whether the Offer attached to the Product has price, priceCurrency, availability and priceValidUntil — and whether any sale window is still valid today.

How we measure: Read the Offer node from the same JSON-LD parse, validate priceValidUntil parses as a date in the future, cross-check price against your public catalog feed when available.

3

Catalog feed quality

16 pts

How we measure: GET the platform-canonical feed path; count GTIN coverage across products; check for google_product_category, brand/vendor presence, and ≥2-level categories.

4

Freshness and inventory

14 pts

Whether the catalog and stock signals are current. Stale prices or stale inventory makes agents downweight you.

How we measure: Last-Modified header on PDP, snapshot recency of our cached catalog, inventory_count > 0 ratio, whether the platform has disable_checkout set.

5

Rich PDP schema

12 pts

Live signal

How we measure: Each of the four nodes contributes 3 of the 12 points. Found via the same @graph-aware JSON-LD parse used for dim 1.

6

Image and media

8 pts

Live signal

≥3 product images, absolute https URLs, alt text, and a populated schema.org image[] array.

How we measure: Count <img> tags on the PDP; fraction with absolute https src; fraction with alt= attribute; length of schema image array on Product.

7

Agent commerce surface

9 pts

Live signal

8

Agent accessibility

7 pts

Live signal

The disqualifying gate

Blocking shopping agents caps the score

What this rubric won’t do

Honest limits, explicit gaps

UCP presence alone is not agent-ready

Pending-audit stores carry an offline-only baseline

Live audit coverage is currently 5,534 of 5,596 (100.0%). The remaining 62 carry an offline-only score derived from cached catalog signals; we mark them "Live audit pending" and refresh within 24h.

Headless-rendered schema is detected, but treated cautiously

LLM narrative is rubric-coupled, not free-form

Reproducibility

Build the score yourself

Spec is one JSON file

Live audit endpoint is public

audit.xpay.sh/api/v2/audit?url=<your-store> returns the same RubricResult shape every score on this site is derived from, plus the per-check evidence we extracted.

Hand-eval calibration is open

Spearman ρ between human scoring and the v2 rubric is the next checkpoint. Distribution will tune as the rubric earns confidence.