Agentic Commerce Platform — Architecture & Deep Dive

⬡

System Architecture Overview

6-layer pipeline from merchant data sources to AI agent consumers

⬡

Data Ingestion Layer

How merchant catalogues flow in — from 5 different source platforms

Design principle

Merchants should be able to connect their store in under 5 minutes with zero technical knowledge. Every source connector meets them where they already are — their existing platform. No new tools to learn. The more friction-free the on-ramp, the faster the merchant count compounds.

Shopify Plugin

Priority channel — cleanest integration

Merchant installs from Shopify App Store — 2 clicks, OAuth
App requests: products, variants, pricing, inventory levels
Webhooks fire on every price change, product update, stock event
Shopify enforces rate limits — collector uses exponential backoff
Real-time sync: price changes reach platform in <30s
~50,000 active Shopify stores in Malaysia — direct distribution

Real-time push

Lazada / Shopee

Seller API — authenticated pull

Apply as ISV (Independent Software Vendor) on both platforms
Seller authenticates — you get OAuth access to their store
Pull: product catalogue, pricing, stock, promotions
Lazada Open Platform & Shopee Open Platform both have stable APIs
Rate limit: typically 10k calls/day per seller app
Scheduled pull every 4h for price + inventory delta

Scheduled pull

WooCommerce

WordPress + REST API plugin

WordPress plugin — merchant installs, generates API key
WooCommerce REST API v3 — products, variations, prices
Merchant pastes their API key into your onboarding form
You pull on schedule or use WooCommerce webhooks
Large SEA presence — many small independent stores on WP

API key auth

Google Merchant Feed

Many merchants already have this — zero extra work

Merchant pastes their Google Merchant Center feed URL
Standard XML / Atom format — product, price, availability, link
Many Shopify/WooCommerce merchants already generate this for Google Shopping ads
Pull daily — price freshness within 24h acceptable for this source
No API key needed — just a public feed URL

Zero-friction

CSV / Manual Upload

Fallback for any platform — including Lazada CSV export

Merchant exports catalogue from any platform as CSV / Excel
Uploads to your platform dashboard
AI parser reads any column layout — infers name, price, SKU, category
Not real-time but works for MVP before API integrations complete
Good for: boutique merchants, small Lazada sellers, offline retailers going online

MVP fallback

⬡

AI Normaliser — The Hard Problem

Why raw merchant data is useless without normalisation, and how to solve it

The core problem

Merchant A lists "Nike Air Max 270 Men UK10 Black". Merchant B lists "Nike running shoe men sz10". Merchant C lists "AM270-BLK-10". These are the same product. Without normalisation, cross-merchant price comparison is impossible — you're just comparing noise.

1

Raw ingestion

Collector receives raw product record. Name, price, description, category (if any), images. Stored as-is in staging table.

2

LLM entity extraction

LLM reads raw product name + description. Extracts: brand, product_line, model, variant (colour, size, material), category_l1/l2. Outputs structured JSON.

3

Embedding generation

Normalised product name + extracted attributes → text-embedding-3-small (or local model). 1536-dim vector stored in pgvector.

4

Entity matching (cosine similarity)

New product embedding queried against existing catalogue. If cosine similarity >0.92 → same product entity. Linked to existing entity ID. If <0.75 → new entity created.

5

Confidence scoring

0.92–1.0 = auto-merge. 0.75–0.92 = human review queue. <0.75 = new entity. Edge cases flagged for merchant confirmation.

6

Price record written

Merchant + entity ID + price + timestamp written to Price History DB. Entity now has N merchant listings attached to it.

Entity Matching Example

// Input: three raw listings
A: "Nike Air Max 270 Men UK10 Black"
B: "Nike running shoe men sz10"
C: "AM270-BLK-10"

// After LLM extraction
A: { brand:"Nike", line:"Air Max 270",
    size:"UK10", colour:"Black" }

// Embedding similarity
cosine_sim(A, B) = 0.94 → same entity
cosine_sim(A, C) = 0.88 → review queue

// Result: all 3 map to entity_id: nike-am270-uk10-black
// Price comparison now works across all 3 merchants

Tech Stack

LLM extraction: GPT-4o-mini (~RM0.002/product) or local Qwen2.5 on ROG
Embeddings: text-embedding-3-small (1536 dim, RM0.0001/1k tokens)
Vector storage: pgvector extension on Supabase PostgreSQL
Similarity search: HNSW index — <10ms for 1M vectors
Review queue: simple Supabase table + admin UI flag
At scale: migrate to Pinecone or Weaviate for >10M products

Cost estimate at scale

100 merchants × avg 500 products = 50,000 products to normalise
LLM extraction: 50k × RM0.002 = RM100 one-time
Embeddings: 50k × RM0.0001 = RM5 one-time
Ongoing: only new/changed products re-processed
Total normalisation cost at MVP scale: <RM200

⬡

Storage Layer

Four specialised data stores — each optimised for its access pattern

Product Catalogue

Master entity registry — one row per unique product

products table
  id          uuid PK
  entity_id   text  -- normalised entity key
  brand       text
  product_line text
  model       text
  category_l1 text  -- e.g. "Footwear"
  category_l2 text  -- e.g. "Running Shoes"
  attributes  jsonb -- colour, size, material
  image_url   text
  created_at  timestamptz

PostgreSQL · Supabase

Price History DB

Time-series price per merchant per product

price_history table
  id           uuid PK
  entity_id    text  FK→products
  merchant_id  uuid  FK→merchants
  price        numeric
  sale_price   numeric  -- if on promo
  currency     text  default 'MYR'
  in_stock     boolean
  url          text  -- checkout link
  recorded_at  timestamptz
  -- hypertable partition on recorded_at

TimescaleDB hypertable

Merchant Registry

Auth, consent, tier, and integration config

merchants table
  id           uuid PK
  name         text
  platform     text  -- shopify/lazada/etc
  tier         text  -- free/basic/pro
  consent_at   timestamptz
  api_key_hash text  -- never store raw keys
  webhook_url  text
  data_sharing text  -- public/private/anon
  active       boolean

Supabase Auth + RLS

Vector Index

Enables natural language product search

product_embeddings table
  id          uuid PK
  entity_id   text FK→products
  embedding   vector(1536)
  text_repr   text  -- what was embedded
  model       text  -- embedding model version

-- HNSW index for fast ANN search
CREATE INDEX ON product_embeddings
  USING hnsw (embedding vector_cosine_ops)
  WITH (m=16, ef_construction=64);

pgvector HNSW index

⬡

Intelligence Layer

Four AI-powered engines that turn raw price data into actionable intelligence

Price Engine

Market positioning intelligence for merchants

Benchmarks: for each product entity, compute market_min, market_max, market_p50, market_p25
Positioning score: merchant's price vs market median — "You're 12% above median"
Anomaly detection: price drops >30% flagged — possible error, promo, or distressed stock
Brand protection: FMCG brands get notified when resellers price below MAP (Minimum Advertised Price)
Runs as scheduled job every 6h — price data is near real-time

SQL aggregation + threshold rules

Demand Forecaster

Velocity signals before consumers see the trend

Velocity score: how many merchants restocked or increased stock for a SKU in last 7d
Price momentum: price trend over 30d — rising = demand outpacing supply
Out-of-stock correlation: if 3+ merchants go OOS for same SKU → demand spike signal
Seasonal pattern detection: time-series ML on historical price history
Output: weekly "trending products" report for merchants in that category

TimescaleDB + Prophet/statsmodels

Alert Engine

Real-time competitive intelligence pushed to merchants

Competitor drop alert: competitor drops price on product you also sell → push via webhook or email
New competitor alert: new merchant joins the platform and lists competing SKUs
OOS opportunity: top competitor goes out of stock → your window to capture demand
MAP violation alert: for FMCG brands — reseller priced below agreed floor
Alert rules are merchant-configured — choose which triggers to receive

Event-driven · Supabase realtime

NL Search AI

The bridge between human language and structured product data

Query parsing: LLM extracts intent from natural language — brand, category, price constraint, urgency
Hybrid search: vector similarity (semantic) + keyword (exact) + SQL filter (price range, in_stock)
Re-ranking: results re-ranked by price, availability, delivery speed
MCP tool: this engine is wrapped as the search_products MCP tool — callable by any AI agent
Response: structured JSON with product list, prices, checkout URLs

RAG · pgvector · LLM re-rank

Why this data is valuable to banks and FMCG

A bank underwriting an SME loan wants to know: is this merchant's pricing competitive? Are they losing market share? Is their category growing or declining? The Intelligence Layer produces exactly these signals — and no bank in Malaysia currently has access to cross-platform, cross-merchant price intelligence. That's a data licensing deal at RM50k–500k/year per buyer.

⬡

MCP Server — The Moat

Model Context Protocol — the standard that makes you natively queryable by every AI agent

Why MCP is the strategic decision

MCP (Model Context Protocol) is the emerging open standard for AI agents to talk to external tools. Anthropic released it, OpenAI adopted it, every major LLM provider is implementing it. It's the USB-C of AI integrations — build once, work everywhere. A merchant indexed in your MCP server is reachable from ChatGPT Operator, Claude, Gemini, Jarvis, and any future AI agent. No custom integration per platform. You become the single source of truth for product search in SEA.

How MCP works

1

AI agent needs to shop

User says: "Buy me the cheapest running shoes under RM200, size 10, deliver by Friday". Agent decides it needs to search for products.

2

Agent queries MCP server

Agent calls your search_products MCP tool with structured parameters: category, max_price, size, delivery_deadline.

3

MCP server queries NL Search AI

Parameters passed to Intelligence Layer. Vector search + SQL filters applied. Top 5 matching products returned with full details.

4

Agent receives structured results

Returns: product name, merchant name, price, stock status, checkout URL, estimated delivery. Agent picks the best match.

5

Phase 2 — Agent transacts

purchase_product MCP tool. Agent passes checkout URL, user payment token. Transaction completes through your platform. You earn GMV commission.

MCP Tool Definitions

// Tool 1: Product search
{
  "name": "search_products",
  "description": "Search merchant catalogue
    by natural language query",
  "inputSchema": {
    "query": "string",    // NL query
    "max_price": "number",  // RM
    "in_stock": "boolean",
    "location": "string",   // KL/MY/SG
    "limit": "number"     // default 5
  }
}

// Tool 2: Price comparison
{
  "name": "compare_prices",
  "description": "Get all merchant
    prices for a product entity",
  "inputSchema": {
    "entity_id": "string"
  }
}

MCP Server Stack

Protocol: MCP SDK (TypeScript / Python both available)
Transport: HTTP/SSE (Server-Sent Events) — works over standard HTTPS
Auth: API key per AI platform / per user
Rate limiting: per-key, tiered — free / pro / enterprise
Hosting: Cloudflare Workers — global edge, <50ms latency anywhere
Build time: ~1 week once REST API is stable

1 week to build · Infinite leverage

AI Platforms That Use MCP

Claude (Anthropic) — native MCP client, desktop + API
ChatGPT — OpenAI adopting MCP for Operator
Cursor, Windsurf — developer tools, early adopters
Any Hermes agent (this fleet) — MCP tools load automatically
Future: every AI assistant will be MCP-compatible within 12–18 months

⬡

API Layer

Three interfaces serving three different consumer types

REST API

Standard HTTPS · OpenAPI 3.0

GET /products — search catalogue
GET /products/:id/prices — all merchant prices
GET /categories/:id/benchmark — market stats
POST /merchants/connect — onboarding
GET /merchants/me/dashboard — SaaS data
Auth: JWT (merchant) + API key (data buyers)
Versioned: /v1/, /v2/ — no breaking changes on minor updates

Supabase Edge Functions

MCP Server

AI agent protocol · the future interface

Tools: search_products, compare_prices
Phase 2: purchase_product, track_order
Transport: HTTP/SSE — works over any HTTPS endpoint
Consumption: per-query billing for high-volume agents
Response format: structured JSON the agent can reason about
No agent needs to scrape Lazada — they just call your MCP tool

Cloudflare Workers · Global edge

Webhook Engine

Push-based · event-driven

Merchant registers a webhook URL at onboarding
Events pushed: price_alert, competitor_oos, new_competitor, map_violation
Payload: entity_id, event_type, delta, timestamp
Retry logic: exponential backoff, 3 attempts, then dead-letter queue
Merchant SaaS dashboard also shows all events in-app
Data buyers receive daily batch via webhook or SFTP

Supabase Realtime · Edge queue

⬡

Revenue Model

Three phases — each unlocking a new revenue stream as the data pool grows

Phase 1 · Month 1–12

Merchant SaaS

RM99–999/mo

Price intelligence dashboard for merchants. Competitive benchmarks, alert engine, demand signals. 50 merchants at avg RM300/mo = RM15k MRR by month 6.

Immediate revenue · No dependencies

Phase 2 · Month 6–18

API + Data Licensing

RM50k–5M/deal

MCP API per-query billing for AI agents (RM0.01–0.05/query). Data licensing to banks, FMCG, PE funds — aggregated market intelligence. One bank deal = 12 months of SaaS revenue.

High margin · Non-dilutive

Phase 3 · Month 18–36

GMV Commission

0.5–2% of GMV

Transaction layer — AI agents buy through your platform, you earn a cut of every purchase. At RM10M monthly AI-driven GMV, 1% commission = RM100k/mo recurring.

Scale play · Platform moat

Stream	Who Pays	Unit Economics	Month 6 Est.	Month 24 Est.
Merchant SaaS	Merchants (direct)	RM99–999/mo per merchant	RM15k MRR	RM120k MRR
API per-query	AI platforms / developers	RM0.01–0.05 / query	–	RM20k MRR
Data licensing	Banks, FMCG, PE	RM50k–5M / annual contract	–	RM500k ARR
Consumer affiliate	Merchants (CPC)	2–8% per click-through purchase	–	RM30k MRR
GMV commission	Merchants (transaction)	0.5–2% of AI-driven GMV	–	RM80k MRR

⬡

Build Roadmap

10 weeks to a working MVP. 12 months to a fundable business.

Weeks 1–4 Foundation

Data Ingestion + Storage

Shopify app — OAuth, webhook registration, product pull
Supabase schema: products, price_history, merchants, product_embeddings
CSV upload parser — infers columns with LLM assist
Basic merchant onboarding flow (Next.js or Lovable)

Weeks 5–6 Intelligence

AI Normaliser + Price Engine

LLM extraction pipeline (GPT-4o-mini batch job)
pgvector embeddings + HNSW index
Entity matching logic — merge, review queue, new entity
Price Engine: benchmark queries, market_p50, anomaly flags

Weeks 7–9 Product

REST API + Merchant Dashboard

Supabase Edge Functions: /products, /prices, /benchmark, /dashboard
OpenAPI spec — documented, testable
Merchant SaaS dashboard: price positioning, competitor alerts, trend charts
Webhook engine: price_alert push to merchant URLs

Week 10 The Moat

MCP Server

MCP SDK (TypeScript) — wrap search_products, compare_prices
Deploy to Cloudflare Workers — global edge, <50ms
Test with Claude desktop — confirm AI agent can shop from your index
API key auth — tiered rate limits ready for commercial use

Month 3–4 Go to market

First 50 Merchants

Shopify App Store listing — organic discovery from SEA merchants
Direct outreach to 200 Malaysian Shopify merchants
Founding member pricing: RM99/mo locked for first 50 (normally RM299)
Pitch: "Your storefront for the agentic internet"

Month 4–6 Revenue

Add Lazada / Shopee + First Data Deal

Apply for Lazada + Shopee ISV status — 4–8 week approval process
Expand catalogue coverage — cross-platform price data now possible
Approach 1 bank or 1 FMCG brand with data intelligence demo
RM15k+ MRR target by month 6

Month 6–12 Scale

Consumer App + Demand Forecaster

Launch consumer-facing price comparison — SEO + organic
Demand Forecaster live — weekly trending product reports
MCP API published publicly — developers build on top
200+ merchants, RM50k+ MRR, first data license signed

Month 12–24 Endgame

Transaction Layer + Raise or Sell

purchase_product MCP tool — AI agents transact through platform
GMV commission live — new revenue stream
SEA's most comprehensive cross-platform product price dataset
Strategic interest from: Lazada, Shopee, local banks, regional PE

The 18-month window

No well-funded player has built a merchant-fed, MCP-native commerce index for Southeast Asia yet. OpenAI Operator and Perplexity Shopping are US-centric. The regional incumbents (Lazada, Shopee) have their own data but no cross-platform aggregation and no MCP layer. You have roughly 18 months before a funded competitor notices this gap. The moat compounds fast once you have 200+ merchants — switching to a new platform means re-integrating all their sources. Move now.

Agentic CommercePlatform

System Architecture Overview

Data Ingestion Layer

Design principle

Shopify Plugin

Lazada / Shopee

WooCommerce

Google Merchant Feed

CSV / Manual Upload

AI Normaliser — The Hard Problem

The core problem

Raw ingestion

LLM entity extraction

Embedding generation

Entity matching (cosine similarity)

Confidence scoring

Price record written

Entity Matching Example

Tech Stack

Cost estimate at scale

Storage Layer

Product Catalogue

Price History DB

Merchant Registry

Vector Index

Intelligence Layer

Price Engine

Demand Forecaster

Alert Engine

NL Search AI

Why this data is valuable to banks and FMCG

MCP Server — The Moat

Why MCP is the strategic decision

How MCP works

AI agent needs to shop

Agent queries MCP server

MCP server queries NL Search AI

Agent receives structured results

Phase 2 — Agent transacts

MCP Tool Definitions

MCP Server Stack

AI Platforms That Use MCP

API Layer

REST API

MCP Server

Webhook Engine

Revenue Model

Build Roadmap

The 18-month window

Agentic Commerce
Platform