Skip to content
Projects
AI inside5 min read

AI SEO Collection Optimizer

Autonomous SEO content engine that captures Lebanese organic search demand by generating high-confidence collection landing pages on a parallel VPS layer, grounded in Search Console signals, Shopify orders, and live catalog data, with a self-improving GSC measurement loop.

Role
Full-stack engineer + AI architect (sole builder)
Year
2025
Status
live
AILLMSEOfull-stack

// AI capabilities

  • Claude Sonnet 4.6 production prompt engineering with strict JSON output
  • Multi-signal confidence scoring (GSC, sales, catalog, similarity)
  • RAG over Shopify catalog and historical performance
  • Self-tuning measurement loop driven by GSC outcomes
  • Human-in-the-loop governance modes (Shadow, Assisted, Soft auto, Full auto)

// Architecture flow

Objective

Capture organic search demand for sports and retail product queries in Lebanon that lb.mikesport.com currently does not rank for, by autonomously generating SEO-superior collection landing pages on a parallel VPS-hosted layer (seo.mikesport.com/collections/*) without modifying the Shopify storefront.

90-day measurable targets

  • Indexed product queries: 4 -> 500 to 1,500
  • Collection-page health score: 38/100 -> 90+/100 on engine pages
  • Generate 80 to 120 ranking landing pages for high-intent Lebanon queries
  • Begin appearing as a citable source in AI Overviews, Perplexity, and ChatGPT for Lebanon retail queries

Approach

1. Signal layer: what to build, proven by data

The engine pulls four independent signals and combines them into a confidence score per candidate collection:

| Signal | Source | Why | | --- | --- | --- | | Search demand | Google Search Console API | What Lebanese users actually Google + existing impressions | | Buyer behavior | Shopify Orders API (sales velocity) | What categories and brands actually sell | | Catalog readiness | Shopify catalog (read-only) | Gates publishing on >= 12 in-stock SKUs | | Cannibalization guard | Existing collections in DB | Jaccard similarity check against current pages |

Composite formula:

confidence = impression_volume * 0.40
           + position_lift     * 0.20
           + catalog_match     * 0.20
           + sales_velocity    * 0.20

Threshold to publish: >= 0.70.

2. Generation layer: content that ranks

For each approved candidate, Claude Sonnet 4.6 generates a full SEO collection page:

  • H1 plus 8 H2s plus 4 H3s
  • 300+ word "Why Shop" SEO block
  • 4-item FAQ accordion
  • 3 quick-fact pills (citable snippets for AI engines)
  • Internal linking graph: subcategories, related collections, complete-the-look
  • Schema bundle: CollectionPage + ItemList + BreadcrumbList + FAQPage + SportingGoodsStore
  • hreflang for en-LB, ar-LB, fr-LB, x-default
  • Open Graph, Twitter Card, canonical, robots meta

3. Indexing layer: fast discovery

On publish the engine pings:

  • Bing IndexNow (covers Bing, Yandex, Seznam)
  • Google sitemap re-crawl
  • Curated priority sitemap with real <lastmod> per URL

4. Measurement and self-improvement loop

A daily cron classifies every published collection by GSC performance and triggers automated remediation:

| Bucket | Trigger | Auto-action | | --- | --- | --- | | Winning | pos <= 10, impressions >= 100 | Amplify (variants, internal links) | | Climbing | pos <= 20, impressions >= 50 | Monitor | | Stuck | pos > 30 after 30 days | Regenerate with GSC-gap content | | Low CTR | CTR < 0.5% at pos < 15 | Rewrite title and meta only | | Failing | 0 impressions after 60 days | Diagnose, fix or kill |

Scoring weights self-tune from outcomes after 60 days of production data.

5. Architecture (read-only against Shopify)

Shopify (catalog, orders)  --READ-->  SEO Engine VPS
                                       |
                                       +-- Postgres (SEO content)
                                       +-- Express + Claude Sonnet 4.6
                                       +-- React admin dashboard
                                       +-- Renders /collections/*
                                              |
                                              v
                                       Google + Bing + AI engines

No write access to Shopify. The engine never modifies the store.

6. Economics

  • One-time generation: $33 per 1,000 collections
  • Ongoing self-improvement loop: ~$16 per year for 1,000 pages
  • Everything else $0: GSC, IndexNow, sitemap pings, hosting on existing VPS

7. Governance

  • Collections start as draft with mandatory human review before publish
  • Operating modes: Shadow -> Assisted -> Soft auto -> Full auto (configurable per category)
  • Cannibalization guard prevents over-publishing similar pages
  • Thin-content guard requires >= 12 in-stock SKUs per page

Tech stack

  • Runtime: Node.js 20, Express, PM2 process manager
  • AI: @anthropic-ai/sdk with Claude Sonnet 4.6, strict JSON schema validation
  • Data sources: Google Search Console API, Shopify Storefront API, Shopify Admin Orders API
  • Database: PostgreSQL for SEO content, generation history, performance buckets
  • Frontend: React 18, Vite, Tailwind CSS for the admin dashboard
  • Infra: Hostinger VPS, Nginx reverse proxy, Let's Encrypt TLS
  • Indexing: IndexNow protocol, Google Search Console sitemap submission

AI engineering highlights

  • Production prompt design that returns deterministic JSON: title, meta, H1, H2s, FAQ, schema fragments, internal link graph
  • Validators reject any LLM response that breaks the schema, references out-of-stock SKUs, or fails the cannibalization Jaccard check
  • Live grounding via Shopify Storefront API: generated copy is always tied to actual catalog state, never stale assumptions
  • Versioned prompts treated as source code, with per-template diff history
  • Self-tuning weights: the four-signal scoring formula adjusts after 60 days based on which weight combinations actually produce ranking pages

Status

| Component | State | | --- | --- | | Catalog bootstrap | Done (1k sample, 27k pull pending Storefront token) | | Sales velocity puller | Done | | GSC reader and measurement | Done | | Discovery and scoring | Done | | Claude generation and validators | Done | | Schema bundle (audit-aligned) | Done | | Indexing pipeline (IndexNow + sitemap ping) | Done | | Admin dashboard (Dashboard, Opportunities, Generate, Analytics) | Done | | First 10 collections | Generated as drafts, awaiting review |

Outcome

Live in production at seo.mikesport.com. Replaces guesswork SEO with evidence-backed automated publishing: every page that ships is justified by real Lebanese search demand, real sales velocity, real catalog readiness, and is observed against Search Console outcomes daily, with the system rewriting under-performers automatically.

Lessons

  • LLMs are a step inside a measurement loop, not a content factory. Without GSC feedback the whole thing is a vanity project.
  • Strict structured outputs are non-negotiable in production. A free-form prompt is a future incident.
  • Live API grounding (Shopify Storefront + Orders) is what separates this from a generic content generator: the copy is always honest about what the store actually sells, and what is actually selling.
  • The cannibalization guard saved us from launching 12 nearly-identical pages in the first generation batch. Operational reality first.
  • Governance modes (Shadow -> Assisted -> Soft auto -> Full auto) are how AI ships safely in retail: you don't earn full auto, you graduate into it after the loop proves itself.

Want to dig deeper?

Ask my AI agent anything about how this was built, what tradeoffs I made, or how it could fit your team.

Ask my AI →