Architecture Deep Dive · v0.1 · SEA-first

Agentic Commerce
Platform

Merchant-fed price intelligence network with an MCP-native API — the infrastructure layer that makes merchant catalogues discoverable by AI shopping agents. Built for Southeast Asia. Merchants who join get indexed. Merchants who don't become invisible to AI shoppers.

Shopify Plugin Lazada / Shopee API AI Normaliser pgvector RAG MCP Server REST API Supabase TimescaleDB

System Architecture Overview

6-layer pipeline from merchant data sources to AI agent consumers

CORE PLATFORM DATA SOURCES INGESTION API LAYER CONSUMERS Shopify Plugin OAuth App Store Real-time sync Lazada / Shopee Seller API Auth pull WooCommerce REST API plugin WordPress stores Google Merchant Feed import CSV / Manual Any platform upload Data Collector Webhook handler Rate limit + queue Edge Function AI Normaliser Maps messy SKUs → clean entities LLM + embeddings Product Catalogue Normalised SKUs PostgreSQL · Supabase Price History DB Time-series pricing TimescaleDB / PG Merchant Registry Auth + consent Supabase Auth Vector Index Semantic product search pgvector / Pinecone Price Engine Benchmarks + anomaly Market positioning Demand Forecaster Velocity + trend Time-series ML Alert Engine Competitor price drop Rule + ML triggers NL Search AI "Find cheapest X" LLM + pgvector RAG REST API Merchant + consumer OpenAPI 3.0 MCP Server AI agent protocol Model Context Protocol Webhook Engine Push price alerts Event-driven Merchant SaaS Price intelligence RM99–999 / mo AI Agents ChatGPT · Claude Agentic shopping Data Buyers Banks · FMCG · PE Data licensing Consumer App Price comparison Phase 2 LEGEND Frontend / SaaS Backend / AI Database API / Cloud — solid: data flow - - dashed: async / Phase 2 AI-native

Data Ingestion Layer

How merchant catalogues flow in — from 5 different source platforms

Design principle

Merchants should be able to connect their store in under 5 minutes with zero technical knowledge. Every source connector meets them where they already are — their existing platform. No new tools to learn. The more friction-free the on-ramp, the faster the merchant count compounds.

Shopify Plugin

Priority channel — cleanest integration

  • Merchant installs from Shopify App Store — 2 clicks, OAuth
  • App requests: products, variants, pricing, inventory levels
  • Webhooks fire on every price change, product update, stock event
  • Shopify enforces rate limits — collector uses exponential backoff
  • Real-time sync: price changes reach platform in <30s
  • ~50,000 active Shopify stores in Malaysia — direct distribution
Real-time push

Lazada / Shopee

Seller API — authenticated pull

  • Apply as ISV (Independent Software Vendor) on both platforms
  • Seller authenticates — you get OAuth access to their store
  • Pull: product catalogue, pricing, stock, promotions
  • Lazada Open Platform & Shopee Open Platform both have stable APIs
  • Rate limit: typically 10k calls/day per seller app
  • Scheduled pull every 4h for price + inventory delta
Scheduled pull

WooCommerce

WordPress + REST API plugin

  • WordPress plugin — merchant installs, generates API key
  • WooCommerce REST API v3 — products, variations, prices
  • Merchant pastes their API key into your onboarding form
  • You pull on schedule or use WooCommerce webhooks
  • Large SEA presence — many small independent stores on WP
API key auth

Google Merchant Feed

Many merchants already have this — zero extra work

  • Merchant pastes their Google Merchant Center feed URL
  • Standard XML / Atom format — product, price, availability, link
  • Many Shopify/WooCommerce merchants already generate this for Google Shopping ads
  • Pull daily — price freshness within 24h acceptable for this source
  • No API key needed — just a public feed URL
Zero-friction

CSV / Manual Upload

Fallback for any platform — including Lazada CSV export

  • Merchant exports catalogue from any platform as CSV / Excel
  • Uploads to your platform dashboard
  • AI parser reads any column layout — infers name, price, SKU, category
  • Not real-time but works for MVP before API integrations complete
  • Good for: boutique merchants, small Lazada sellers, offline retailers going online
MVP fallback

AI Normaliser — The Hard Problem

Why raw merchant data is useless without normalisation, and how to solve it

The core problem

Merchant A lists "Nike Air Max 270 Men UK10 Black". Merchant B lists "Nike running shoe men sz10". Merchant C lists "AM270-BLK-10". These are the same product. Without normalisation, cross-merchant price comparison is impossible — you're just comparing noise.

1

Raw ingestion

Collector receives raw product record. Name, price, description, category (if any), images. Stored as-is in staging table.

2

LLM entity extraction

LLM reads raw product name + description. Extracts: brand, product_line, model, variant (colour, size, material), category_l1/l2. Outputs structured JSON.

3

Embedding generation

Normalised product name + extracted attributes → text-embedding-3-small (or local model). 1536-dim vector stored in pgvector.

4

Entity matching (cosine similarity)

New product embedding queried against existing catalogue. If cosine similarity >0.92 → same product entity. Linked to existing entity ID. If <0.75 → new entity created.

5

Confidence scoring

0.92–1.0 = auto-merge. 0.75–0.92 = human review queue. <0.75 = new entity. Edge cases flagged for merchant confirmation.

6

Price record written

Merchant + entity ID + price + timestamp written to Price History DB. Entity now has N merchant listings attached to it.

Entity Matching Example

// Input: three raw listings A: "Nike Air Max 270 Men UK10 Black" B: "Nike running shoe men sz10" C: "AM270-BLK-10" // After LLM extraction A: { brand:"Nike", line:"Air Max 270", size:"UK10", colour:"Black" } // Embedding similarity cosine_sim(A, B) = 0.94 → same entity cosine_sim(A, C) = 0.88 → review queue // Result: all 3 map to entity_id: nike-am270-uk10-black // Price comparison now works across all 3 merchants

Tech Stack

  • LLM extraction: GPT-4o-mini (~RM0.002/product) or local Qwen2.5 on ROG
  • Embeddings: text-embedding-3-small (1536 dim, RM0.0001/1k tokens)
  • Vector storage: pgvector extension on Supabase PostgreSQL
  • Similarity search: HNSW index — <10ms for 1M vectors
  • Review queue: simple Supabase table + admin UI flag
  • At scale: migrate to Pinecone or Weaviate for >10M products

Cost estimate at scale

  • 100 merchants × avg 500 products = 50,000 products to normalise
  • LLM extraction: 50k × RM0.002 = RM100 one-time
  • Embeddings: 50k × RM0.0001 = RM5 one-time
  • Ongoing: only new/changed products re-processed
  • Total normalisation cost at MVP scale: <RM200

Storage Layer

Four specialised data stores — each optimised for its access pattern

Product Catalogue

Master entity registry — one row per unique product

products table id uuid PK entity_id text -- normalised entity key brand text product_line text model text category_l1 text -- e.g. "Footwear" category_l2 text -- e.g. "Running Shoes" attributes jsonb -- colour, size, material image_url text created_at timestamptz
PostgreSQL · Supabase

Price History DB

Time-series price per merchant per product

price_history table id uuid PK entity_id text FK→products merchant_id uuid FK→merchants price numeric sale_price numeric -- if on promo currency text default 'MYR' in_stock boolean url text -- checkout link recorded_at timestamptz -- hypertable partition on recorded_at
TimescaleDB hypertable

Merchant Registry

Auth, consent, tier, and integration config

merchants table id uuid PK name text platform text -- shopify/lazada/etc tier text -- free/basic/pro consent_at timestamptz api_key_hash text -- never store raw keys webhook_url text data_sharing text -- public/private/anon active boolean
Supabase Auth + RLS

Vector Index

Enables natural language product search

product_embeddings table id uuid PK entity_id text FK→products embedding vector(1536) text_repr text -- what was embedded model text -- embedding model version -- HNSW index for fast ANN search CREATE INDEX ON product_embeddings USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64);
pgvector HNSW index

Intelligence Layer

Four AI-powered engines that turn raw price data into actionable intelligence

Price Engine

Market positioning intelligence for merchants

  • Benchmarks: for each product entity, compute market_min, market_max, market_p50, market_p25
  • Positioning score: merchant's price vs market median — "You're 12% above median"
  • Anomaly detection: price drops >30% flagged — possible error, promo, or distressed stock
  • Brand protection: FMCG brands get notified when resellers price below MAP (Minimum Advertised Price)
  • Runs as scheduled job every 6h — price data is near real-time
SQL aggregation + threshold rules

Demand Forecaster

Velocity signals before consumers see the trend

  • Velocity score: how many merchants restocked or increased stock for a SKU in last 7d
  • Price momentum: price trend over 30d — rising = demand outpacing supply
  • Out-of-stock correlation: if 3+ merchants go OOS for same SKU → demand spike signal
  • Seasonal pattern detection: time-series ML on historical price history
  • Output: weekly "trending products" report for merchants in that category
TimescaleDB + Prophet/statsmodels

Alert Engine

Real-time competitive intelligence pushed to merchants

  • Competitor drop alert: competitor drops price on product you also sell → push via webhook or email
  • New competitor alert: new merchant joins the platform and lists competing SKUs
  • OOS opportunity: top competitor goes out of stock → your window to capture demand
  • MAP violation alert: for FMCG brands — reseller priced below agreed floor
  • Alert rules are merchant-configured — choose which triggers to receive
Event-driven · Supabase realtime

NL Search AI

The bridge between human language and structured product data

  • Query parsing: LLM extracts intent from natural language — brand, category, price constraint, urgency
  • Hybrid search: vector similarity (semantic) + keyword (exact) + SQL filter (price range, in_stock)
  • Re-ranking: results re-ranked by price, availability, delivery speed
  • MCP tool: this engine is wrapped as the search_products MCP tool — callable by any AI agent
  • Response: structured JSON with product list, prices, checkout URLs
RAG · pgvector · LLM re-rank

Why this data is valuable to banks and FMCG

A bank underwriting an SME loan wants to know: is this merchant's pricing competitive? Are they losing market share? Is their category growing or declining? The Intelligence Layer produces exactly these signals — and no bank in Malaysia currently has access to cross-platform, cross-merchant price intelligence. That's a data licensing deal at RM50k–500k/year per buyer.

MCP Server — The Moat

Model Context Protocol — the standard that makes you natively queryable by every AI agent

Why MCP is the strategic decision

MCP (Model Context Protocol) is the emerging open standard for AI agents to talk to external tools. Anthropic released it, OpenAI adopted it, every major LLM provider is implementing it. It's the USB-C of AI integrations — build once, work everywhere. A merchant indexed in your MCP server is reachable from ChatGPT Operator, Claude, Gemini, Jarvis, and any future AI agent. No custom integration per platform. You become the single source of truth for product search in SEA.

How MCP works

1

AI agent needs to shop

User says: "Buy me the cheapest running shoes under RM200, size 10, deliver by Friday". Agent decides it needs to search for products.

2

Agent queries MCP server

Agent calls your search_products MCP tool with structured parameters: category, max_price, size, delivery_deadline.

3

MCP server queries NL Search AI

Parameters passed to Intelligence Layer. Vector search + SQL filters applied. Top 5 matching products returned with full details.

4

Agent receives structured results

Returns: product name, merchant name, price, stock status, checkout URL, estimated delivery. Agent picks the best match.

5

Phase 2 — Agent transacts

purchase_product MCP tool. Agent passes checkout URL, user payment token. Transaction completes through your platform. You earn GMV commission.

MCP Tool Definitions

// Tool 1: Product search { "name": "search_products", "description": "Search merchant catalogue by natural language query", "inputSchema": { "query": "string", // NL query "max_price": "number", // RM "in_stock": "boolean", "location": "string", // KL/MY/SG "limit": "number" // default 5 } } // Tool 2: Price comparison { "name": "compare_prices", "description": "Get all merchant prices for a product entity", "inputSchema": { "entity_id": "string" } }

MCP Server Stack

  • Protocol: MCP SDK (TypeScript / Python both available)
  • Transport: HTTP/SSE (Server-Sent Events) — works over standard HTTPS
  • Auth: API key per AI platform / per user
  • Rate limiting: per-key, tiered — free / pro / enterprise
  • Hosting: Cloudflare Workers — global edge, <50ms latency anywhere
  • Build time: ~1 week once REST API is stable
1 week to build · Infinite leverage

AI Platforms That Use MCP

  • Claude (Anthropic) — native MCP client, desktop + API
  • ChatGPT — OpenAI adopting MCP for Operator
  • Cursor, Windsurf — developer tools, early adopters
  • Any Hermes agent (this fleet) — MCP tools load automatically
  • Future: every AI assistant will be MCP-compatible within 12–18 months

API Layer

Three interfaces serving three different consumer types

REST API

Standard HTTPS · OpenAPI 3.0

  • GET /products — search catalogue
  • GET /products/:id/prices — all merchant prices
  • GET /categories/:id/benchmark — market stats
  • POST /merchants/connect — onboarding
  • GET /merchants/me/dashboard — SaaS data
  • Auth: JWT (merchant) + API key (data buyers)
  • Versioned: /v1/, /v2/ — no breaking changes on minor updates
Supabase Edge Functions

MCP Server

AI agent protocol · the future interface

  • Tools: search_products, compare_prices
  • Phase 2: purchase_product, track_order
  • Transport: HTTP/SSE — works over any HTTPS endpoint
  • Consumption: per-query billing for high-volume agents
  • Response format: structured JSON the agent can reason about
  • No agent needs to scrape Lazada — they just call your MCP tool
Cloudflare Workers · Global edge

Webhook Engine

Push-based · event-driven

  • Merchant registers a webhook URL at onboarding
  • Events pushed: price_alert, competitor_oos, new_competitor, map_violation
  • Payload: entity_id, event_type, delta, timestamp
  • Retry logic: exponential backoff, 3 attempts, then dead-letter queue
  • Merchant SaaS dashboard also shows all events in-app
  • Data buyers receive daily batch via webhook or SFTP
Supabase Realtime · Edge queue

Revenue Model

Three phases — each unlocking a new revenue stream as the data pool grows

Phase 1 · Month 1–12
Merchant SaaS
RM99–999/mo
Price intelligence dashboard for merchants. Competitive benchmarks, alert engine, demand signals. 50 merchants at avg RM300/mo = RM15k MRR by month 6.
Immediate revenue · No dependencies
Phase 2 · Month 6–18
API + Data Licensing
RM50k–5M/deal
MCP API per-query billing for AI agents (RM0.01–0.05/query). Data licensing to banks, FMCG, PE funds — aggregated market intelligence. One bank deal = 12 months of SaaS revenue.
High margin · Non-dilutive
Phase 3 · Month 18–36
GMV Commission
0.5–2% of GMV
Transaction layer — AI agents buy through your platform, you earn a cut of every purchase. At RM10M monthly AI-driven GMV, 1% commission = RM100k/mo recurring.
Scale play · Platform moat
Stream Who Pays Unit Economics Month 6 Est. Month 24 Est.
Merchant SaaSMerchants (direct)RM99–999/mo per merchantRM15k MRRRM120k MRR
API per-queryAI platforms / developersRM0.01–0.05 / queryRM20k MRR
Data licensingBanks, FMCG, PERM50k–5M / annual contractRM500k ARR
Consumer affiliateMerchants (CPC)2–8% per click-through purchaseRM30k MRR
GMV commissionMerchants (transaction)0.5–2% of AI-driven GMVRM80k MRR

Build Roadmap

10 weeks to a working MVP. 12 months to a fundable business.

Weeks 1–4 Foundation
Data Ingestion + Storage
  • Shopify app — OAuth, webhook registration, product pull
  • Supabase schema: products, price_history, merchants, product_embeddings
  • CSV upload parser — infers columns with LLM assist
  • Basic merchant onboarding flow (Next.js or Lovable)
Weeks 5–6 Intelligence
AI Normaliser + Price Engine
  • LLM extraction pipeline (GPT-4o-mini batch job)
  • pgvector embeddings + HNSW index
  • Entity matching logic — merge, review queue, new entity
  • Price Engine: benchmark queries, market_p50, anomaly flags
Weeks 7–9 Product
REST API + Merchant Dashboard
  • Supabase Edge Functions: /products, /prices, /benchmark, /dashboard
  • OpenAPI spec — documented, testable
  • Merchant SaaS dashboard: price positioning, competitor alerts, trend charts
  • Webhook engine: price_alert push to merchant URLs
Week 10 The Moat
MCP Server
  • MCP SDK (TypeScript) — wrap search_products, compare_prices
  • Deploy to Cloudflare Workers — global edge, <50ms
  • Test with Claude desktop — confirm AI agent can shop from your index
  • API key auth — tiered rate limits ready for commercial use
Month 3–4 Go to market
First 50 Merchants
  • Shopify App Store listing — organic discovery from SEA merchants
  • Direct outreach to 200 Malaysian Shopify merchants
  • Founding member pricing: RM99/mo locked for first 50 (normally RM299)
  • Pitch: "Your storefront for the agentic internet"
Month 4–6 Revenue
Add Lazada / Shopee + First Data Deal
  • Apply for Lazada + Shopee ISV status — 4–8 week approval process
  • Expand catalogue coverage — cross-platform price data now possible
  • Approach 1 bank or 1 FMCG brand with data intelligence demo
  • RM15k+ MRR target by month 6
Month 6–12 Scale
Consumer App + Demand Forecaster
  • Launch consumer-facing price comparison — SEO + organic
  • Demand Forecaster live — weekly trending product reports
  • MCP API published publicly — developers build on top
  • 200+ merchants, RM50k+ MRR, first data license signed
Month 12–24 Endgame
Transaction Layer + Raise or Sell
  • purchase_product MCP tool — AI agents transact through platform
  • GMV commission live — new revenue stream
  • SEA's most comprehensive cross-platform product price dataset
  • Strategic interest from: Lazada, Shopee, local banks, regional PE

The 18-month window

No well-funded player has built a merchant-fed, MCP-native commerce index for Southeast Asia yet. OpenAI Operator and Perplexity Shopping are US-centric. The regional incumbents (Lazada, Shopee) have their own data but no cross-platform aggregation and no MCP layer. You have roughly 18 months before a funded competitor notices this gap. The moat compounds fast once you have 200+ merchants — switching to a new platform means re-integrating all their sources. Move now.