AI SEO Collection Optimizer
Autonomous SEO content engine that captures Lebanese organic search demand by generating high-confidence collection landing pages on a parallel VPS layer, grounded in Search Console signals, Shopify orders, and live catalog data, with a self-improving GSC measurement loop.
// AI capabilities
- Claude Sonnet 4.6 production prompt engineering with strict JSON output
- Multi-signal confidence scoring (GSC, sales, catalog, similarity)
- RAG over Shopify catalog and historical performance
- Self-tuning measurement loop driven by GSC outcomes
- Human-in-the-loop governance modes (Shadow, Assisted, Soft auto, Full auto)
// Architecture flow
Objective
Capture organic search demand for sports and retail product queries in Lebanon that lb.mikesport.com currently does not rank for, by autonomously generating SEO-superior collection landing pages on a parallel VPS-hosted layer (seo.mikesport.com/collections/*) without modifying the Shopify storefront.
90-day measurable targets
- Indexed product queries: 4 -> 500 to 1,500
- Collection-page health score: 38/100 -> 90+/100 on engine pages
- Generate 80 to 120 ranking landing pages for high-intent Lebanon queries
- Begin appearing as a citable source in AI Overviews, Perplexity, and ChatGPT for Lebanon retail queries
Approach
1. Signal layer: what to build, proven by data
The engine pulls four independent signals and combines them into a confidence score per candidate collection:
| Signal | Source | Why | | --- | --- | --- | | Search demand | Google Search Console API | What Lebanese users actually Google + existing impressions | | Buyer behavior | Shopify Orders API (sales velocity) | What categories and brands actually sell | | Catalog readiness | Shopify catalog (read-only) | Gates publishing on >= 12 in-stock SKUs | | Cannibalization guard | Existing collections in DB | Jaccard similarity check against current pages |
Composite formula:
confidence = impression_volume * 0.40
+ position_lift * 0.20
+ catalog_match * 0.20
+ sales_velocity * 0.20
Threshold to publish: >= 0.70.
2. Generation layer: content that ranks
For each approved candidate, Claude Sonnet 4.6 generates a full SEO collection page:
- H1 plus 8 H2s plus 4 H3s
- 300+ word "Why Shop" SEO block
- 4-item FAQ accordion
- 3 quick-fact pills (citable snippets for AI engines)
- Internal linking graph: subcategories, related collections, complete-the-look
- Schema bundle:
CollectionPage+ItemList+BreadcrumbList+FAQPage+SportingGoodsStore - hreflang for
en-LB,ar-LB,fr-LB,x-default - Open Graph, Twitter Card, canonical, robots meta
3. Indexing layer: fast discovery
On publish the engine pings:
- Bing IndexNow (covers Bing, Yandex, Seznam)
- Google sitemap re-crawl
- Curated priority sitemap with real
<lastmod>per URL
4. Measurement and self-improvement loop
A daily cron classifies every published collection by GSC performance and triggers automated remediation:
| Bucket | Trigger | Auto-action | | --- | --- | --- | | Winning | pos <= 10, impressions >= 100 | Amplify (variants, internal links) | | Climbing | pos <= 20, impressions >= 50 | Monitor | | Stuck | pos > 30 after 30 days | Regenerate with GSC-gap content | | Low CTR | CTR < 0.5% at pos < 15 | Rewrite title and meta only | | Failing | 0 impressions after 60 days | Diagnose, fix or kill |
Scoring weights self-tune from outcomes after 60 days of production data.
5. Architecture (read-only against Shopify)
Shopify (catalog, orders) --READ--> SEO Engine VPS
|
+-- Postgres (SEO content)
+-- Express + Claude Sonnet 4.6
+-- React admin dashboard
+-- Renders /collections/*
|
v
Google + Bing + AI engines
No write access to Shopify. The engine never modifies the store.
6. Economics
- One-time generation: $33 per 1,000 collections
- Ongoing self-improvement loop: ~$16 per year for 1,000 pages
- Everything else $0: GSC, IndexNow, sitemap pings, hosting on existing VPS
7. Governance
- Collections start as draft with mandatory human review before publish
- Operating modes: Shadow -> Assisted -> Soft auto -> Full auto (configurable per category)
- Cannibalization guard prevents over-publishing similar pages
- Thin-content guard requires >= 12 in-stock SKUs per page
Tech stack
- Runtime: Node.js 20, Express, PM2 process manager
- AI:
@anthropic-ai/sdkwith Claude Sonnet 4.6, strict JSON schema validation - Data sources: Google Search Console API, Shopify Storefront API, Shopify Admin Orders API
- Database: PostgreSQL for SEO content, generation history, performance buckets
- Frontend: React 18, Vite, Tailwind CSS for the admin dashboard
- Infra: Hostinger VPS, Nginx reverse proxy, Let's Encrypt TLS
- Indexing: IndexNow protocol, Google Search Console sitemap submission
AI engineering highlights
- Production prompt design that returns deterministic JSON: title, meta, H1, H2s, FAQ, schema fragments, internal link graph
- Validators reject any LLM response that breaks the schema, references out-of-stock SKUs, or fails the cannibalization Jaccard check
- Live grounding via Shopify Storefront API: generated copy is always tied to actual catalog state, never stale assumptions
- Versioned prompts treated as source code, with per-template diff history
- Self-tuning weights: the four-signal scoring formula adjusts after 60 days based on which weight combinations actually produce ranking pages
Status
| Component | State | | --- | --- | | Catalog bootstrap | Done (1k sample, 27k pull pending Storefront token) | | Sales velocity puller | Done | | GSC reader and measurement | Done | | Discovery and scoring | Done | | Claude generation and validators | Done | | Schema bundle (audit-aligned) | Done | | Indexing pipeline (IndexNow + sitemap ping) | Done | | Admin dashboard (Dashboard, Opportunities, Generate, Analytics) | Done | | First 10 collections | Generated as drafts, awaiting review |
Outcome
Live in production at seo.mikesport.com. Replaces guesswork SEO with evidence-backed automated publishing: every page that ships is justified by real Lebanese search demand, real sales velocity, real catalog readiness, and is observed against Search Console outcomes daily, with the system rewriting under-performers automatically.
Lessons
- LLMs are a step inside a measurement loop, not a content factory. Without GSC feedback the whole thing is a vanity project.
- Strict structured outputs are non-negotiable in production. A free-form prompt is a future incident.
- Live API grounding (Shopify Storefront + Orders) is what separates this from a generic content generator: the copy is always honest about what the store actually sells, and what is actually selling.
- The cannibalization guard saved us from launching 12 nearly-identical pages in the first generation batch. Operational reality first.
- Governance modes (Shadow -> Assisted -> Soft auto -> Full auto) are how AI ships safely in retail: you don't earn full auto, you graduate into it after the loop proves itself.
Want to dig deeper?
Ask my AI agent anything about how this was built, what tradeoffs I made, or how it could fit your team.
Ask my AI →// related projects
Marketing Intelligence Dashboard
Enterprise marketing analytics platform with real-time dashboards, OpenAI-powered insights via the Vercel AI SDK, and one-click PPTX stakeholder reporting.
Product Data Enrichment Dashboard
AI-assisted product enrichment pipeline with confidence scoring, source-tracked LLM proposals, and a queue-based architecture that never silently overwrites master data.
Linc Consulting Lead App
Lead management app that uses the Anthropic Claude SDK to qualify, score, and route incoming consulting leads.