AI inside5 min read

AI SEO Collection Optimizer

Autonomous SEO content engine that captures Lebanese organic search demand by generating high-confidence collection landing pages on a parallel VPS layer, grounded in Search Console signals, Shopify orders, and live catalog data, with a self-improving GSC measurement loop.

Role

Full-stack engineer + AI architect (sole builder)

Year

2025

Status

live

AILLMSEOfull-stack

// AI capabilities

Claude Sonnet 4.6 production prompt engineering with strict JSON output
Multi-signal confidence scoring (GSC, sales, catalog, similarity)
RAG over Shopify catalog and historical performance
Self-tuning measurement loop driven by GSC outcomes
Human-in-the-loop governance modes (Shadow, Assisted, Soft auto, Full auto)

// Architecture flow

Objective

Capture organic search demand for sports and retail product queries in Lebanon that lb.mikesport.com currently does not rank for, by autonomously generating SEO-superior collection landing pages on a parallel VPS-hosted layer (seo.mikesport.com/collections/*) without modifying the Shopify storefront.

90-day measurable targets

Indexed product queries: 4 -> 500 to 1,500
Collection-page health score: 38/100 -> 90+/100 on engine pages
Generate 80 to 120 ranking landing pages for high-intent Lebanon queries
Begin appearing as a citable source in AI Overviews, Perplexity, and ChatGPT for Lebanon retail queries

Approach

1. Signal layer: what to build, proven by data

The engine pulls four independent signals and combines them into a confidence score per candidate collection:

| Signal | Source | Why | | --- | --- | --- | | Search demand | Google Search Console API | What Lebanese users actually Google + existing impressions | | Buyer behavior | Shopify Orders API (sales velocity) | What categories and brands actually sell | | Catalog readiness | Shopify catalog (read-only) | Gates publishing on >= 12 in-stock SKUs | | Cannibalization guard | Existing collections in DB | Jaccard similarity check against current pages |

Composite formula:

confidence = impression_volume * 0.40
           + position_lift     * 0.20
           + catalog_match     * 0.20
           + sales_velocity    * 0.20

Threshold to publish: >= 0.70.

2. Generation layer: content that ranks

For each approved candidate, Claude Sonnet 4.6 generates a full SEO collection page:

H1 plus 8 H2s plus 4 H3s
300+ word "Why Shop" SEO block
4-item FAQ accordion
3 quick-fact pills (citable snippets for AI engines)
Internal linking graph: subcategories, related collections, complete-the-look
Schema bundle: CollectionPage + ItemList + BreadcrumbList + FAQPage + SportingGoodsStore
hreflang for en-LB, ar-LB, fr-LB, x-default
Open Graph, Twitter Card, canonical, robots meta

3. Indexing layer: fast discovery

On publish the engine pings:

Bing IndexNow (covers Bing, Yandex, Seznam)
Google sitemap re-crawl
Curated priority sitemap with real <lastmod> per URL

4. Measurement and self-improvement loop

A daily cron classifies every published collection by GSC performance and triggers automated remediation:

| Bucket | Trigger | Auto-action | | --- | --- | --- | | Winning | pos <= 10, impressions >= 100 | Amplify (variants, internal links) | | Climbing | pos <= 20, impressions >= 50 | Monitor | | Stuck | pos > 30 after 30 days | Regenerate with GSC-gap content | | Low CTR | CTR < 0.5% at pos < 15 | Rewrite title and meta only | | Failing | 0 impressions after 60 days | Diagnose, fix or kill |

Scoring weights self-tune from outcomes after 60 days of production data.

5. Architecture (read-only against Shopify)

Shopify (catalog, orders)  --READ-->  SEO Engine VPS
                                       |
                                       +-- Postgres (SEO content)
                                       +-- Express + Claude Sonnet 4.6
                                       +-- React admin dashboard
                                       +-- Renders /collections/*
                                              |
                                              v
                                       Google + Bing + AI engines

No write access to Shopify. The engine never modifies the store.

6. Economics

One-time generation: $33 per 1,000 collections
Ongoing self-improvement loop: ~$16 per year for 1,000 pages
Everything else $0: GSC, IndexNow, sitemap pings, hosting on existing VPS

7. Governance

Collections start as draft with mandatory human review before publish
Operating modes: Shadow -> Assisted -> Soft auto -> Full auto (configurable per category)
Cannibalization guard prevents over-publishing similar pages
Thin-content guard requires >= 12 in-stock SKUs per page

Tech stack

Runtime: Node.js 20, Express, PM2 process manager
AI: @anthropic-ai/sdk with Claude Sonnet 4.6, strict JSON schema validation
Data sources: Google Search Console API, Shopify Storefront API, Shopify Admin Orders API
Database: PostgreSQL for SEO content, generation history, performance buckets
Frontend: React 18, Vite, Tailwind CSS for the admin dashboard
Infra: Hostinger VPS, Nginx reverse proxy, Let's Encrypt TLS
Indexing: IndexNow protocol, Google Search Console sitemap submission

AI engineering highlights

Production prompt design that returns deterministic JSON: title, meta, H1, H2s, FAQ, schema fragments, internal link graph
Validators reject any LLM response that breaks the schema, references out-of-stock SKUs, or fails the cannibalization Jaccard check
Live grounding via Shopify Storefront API: generated copy is always tied to actual catalog state, never stale assumptions
Versioned prompts treated as source code, with per-template diff history
Self-tuning weights: the four-signal scoring formula adjusts after 60 days based on which weight combinations actually produce ranking pages

Status

| Component | State | | --- | --- | | Catalog bootstrap | Done (1k sample, 27k pull pending Storefront token) | | Sales velocity puller | Done | | GSC reader and measurement | Done | | Discovery and scoring | Done | | Claude generation and validators | Done | | Schema bundle (audit-aligned) | Done | | Indexing pipeline (IndexNow + sitemap ping) | Done | | Admin dashboard (Dashboard, Opportunities, Generate, Analytics) | Done | | First 10 collections | Generated as drafts, awaiting review |

Outcome

Live in production at seo.mikesport.com. Replaces guesswork SEO with evidence-backed automated publishing: every page that ships is justified by real Lebanese search demand, real sales velocity, real catalog readiness, and is observed against Search Console outcomes daily, with the system rewriting under-performers automatically.

Lessons

LLMs are a step inside a measurement loop, not a content factory. Without GSC feedback the whole thing is a vanity project.
Strict structured outputs are non-negotiable in production. A free-form prompt is a future incident.
Live API grounding (Shopify Storefront + Orders) is what separates this from a generic content generator: the copy is always honest about what the store actually sells, and what is actually selling.
The cannibalization guard saved us from launching 12 nearly-identical pages in the first generation batch. Operational reality first.
Governance modes (Shadow -> Assisted -> Soft auto -> Full auto) are how AI ships safely in retail: you don't earn full auto, you graduate into it after the loop proves itself.

Want to dig deeper?

Ask my AI agent anything about how this was built, what tradeoffs I made, or how it could fit your team.

Ask my AI →

AI inside

Marketing Intelligence Dashboard

Enterprise marketing analytics platform with real-time dashboards, OpenAI-powered insights via the Vercel AI SDK, and one-click PPTX stakeholder reporting.

AILLManalyticsfull-stack

AI inside

Product Data Enrichment Dashboard

AI-assisted product enrichment pipeline with confidence scoring, source-tracked LLM proposals, and a queue-based architecture that never silently overwrites master data.

AILLMautomationfull-stack

AI inside

Linc Consulting Lead App

Lead management app that uses the Anthropic Claude SDK to qualify, score, and route incoming consulting leads.

AILLMautomationfull-stack

// AI capabilities

// Architecture flow

Objective

90-day measurable targets

Approach

1. Signal layer: what to build, proven by data

2. Generation layer: content that ranks

3. Indexing layer: fast discovery

4. Measurement and self-improvement loop

5. Architecture (read-only against Shopify)

6. Economics

7. Governance

Tech stack

AI engineering highlights

Status

Outcome

Lessons

// related projects

Marketing Intelligence Dashboard

Product Data Enrichment Dashboard

Linc Consulting Lead App