AI inside3 min read

Product Data Enrichment Dashboard

AI-assisted product enrichment pipeline with confidence scoring, source-tracked LLM proposals, and a queue-based architecture that never silently overwrites master data.

Role

Full-stack architect and lead engineer

Year

2025

Status

live

AILLMautomationfull-stackdata

// AI capabilities

Anthropic Claude SDK + provider-agnostic LLM abstraction
Structured-output prompting for enrichment fields
Confidence scoring per proposed value
Source URL attribution for every enriched field
Web search and scraping integration (SerpAPI / ScrapingBee)
BullMQ + Redis job orchestration for AI pipelines

// Architecture flow

Overview

A queue-driven enrichment dashboard that ingests CSV/Excel product files, looks them up against an internal nine-domain classification scheme, searches the brand web, scrapes structured specs, and uses an LLM to propose SEO-ready titles, descriptions, and HTML blocks. Every proposal lands in a review queue with a confidence score and a source URL. The system never silently overwrites master data.

Problem

Mike Sport's multi-brand catalog had thousands of products with sparse, inconsistent attributes spanning Adidas, Asics, Nike, Puma, and dozens of other brands. Pure-LLM enrichment was tempting but dangerous: a single hallucination could pollute master data and propagate to every downstream system. Manual entry was untenable. The team needed AI scale with human governance.

Approach

Treat the LLM as a research assistant, not a writer. Every enrichment becomes a proposal with provenance. The reviewer sees the source URL, confidence, and the LLM's reasoning, and approves or rejects per field. Provider-agnostic abstraction lets the model be swapped (Anthropic by default, OpenAI optional) without any change to callers.

Architecture

Frontend: Next.js 14 + Tailwind. Excel/CSV upload, column auto-detection, inline editing, bulk approvals, audit trail.
Backend: Express + TypeScript with Prisma ORM and PostgreSQL.
Queue: BullMQ on Redis. Each enrichment task is a job with status, retries, and confidence-based routing.
AI provider: Pluggable. seoService.ts exposes a single interface; concrete classes implement Anthropic and OpenAI backends.
Search and scrape: SerpAPI / Serper for brand web search, ScrapingBee for structured spec extraction. Both pluggable.
Reference repository: A 9-domain classification index (Division, Category, Product Group, Family, Brand, Gender, Season, Country of Origin, HS Code) constrains the search space and gives every product a canonical place.

Tech stack

Frontend: Next.js 14, React, TypeScript, Tailwind CSS
Backend: Express, TypeScript, Prisma, PostgreSQL
Queue: BullMQ, Redis
AI: @anthropic-ai/sdk (default), OpenAI SDK (swappable)
External: SerpAPI / Serper, ScrapingBee, custom web scraper

AI work

Structured-output enrichment service with field-level confidence scores.
Provider-agnostic adapter so models are swappable without changing callers.
Audit trail linking every enriched value to a source URL the reviewer can click.
Governance-first design: AI never overwrites verified master data. It always proposes; humans confirm.
Prompts versioned in the repo and tested against a sample fixture set.

Engineering highlights

Column auto-detection: drop in a vendor's spreadsheet and the system maps their column names to the canonical schema using fuzzy matching plus an LLM tiebreaker.
Confidence-gated review: high-confidence enrichments stream into a fast-approve view; low-confidence go to a deeper review queue.
Bulk operations: approve, reject, or override hundreds of proposals at once.
Export pipeline: enriched workbook + audit tabs, ready for downstream catalog systems.
Retry-aware queue: BullMQ handles rate limits gracefully across SerpAPI, ScrapingBee, and the LLM.

Outcome

Operational across multiple brands. Confidence-gated review reduces manual effort dramatically while keeping the brand's data team in control. The "propose, never overwrite" pattern has become the team's default for any AI-assisted operation on master data.

Lessons

The most valuable thing an LLM can produce in a data pipeline is a proposal with provenance, not an answer.
BullMQ + Redis is overkill for a hundred items but exactly right for a thousand. The queue is the contract between the AI and the reviewers.
The hardest part wasn't the model. It was building a review UX that a non-engineer can trust at speed.

Want to dig deeper?

Ask my AI agent anything about how this was built, what tradeoffs I made, or how it could fit your team.

Ask my AI →

AI inside

Linc Consulting Lead App

Lead management app that uses the Anthropic Claude SDK to qualify, score, and route incoming consulting leads.

AILLMautomationfull-stack

AI inside

AI SEO Collection Optimizer

Autonomous SEO content engine that captures Lebanese organic search demand by generating high-confidence collection landing pages on a parallel VPS layer, grounded in Search Console signals, Shopify orders, and live catalog data, with a self-improving GSC measurement loop.

AILLMSEOfull-stack

AI inside

Marketing Intelligence Dashboard

Enterprise marketing analytics platform with real-time dashboards, OpenAI-powered insights via the Vercel AI SDK, and one-click PPTX stakeholder reporting.

AILLManalyticsfull-stack

// AI capabilities

// Architecture flow

Overview

Problem

Approach

Architecture

Tech stack

AI work

Engineering highlights

Outcome

Lessons

// related projects

Linc Consulting Lead App

AI SEO Collection Optimizer

Marketing Intelligence Dashboard