Skip to content
Projects
AI inside3 min read

Product Data Enrichment Dashboard

AI-assisted product enrichment pipeline with confidence scoring, source-tracked LLM proposals, and a queue-based architecture that never silently overwrites master data.

Role
Full-stack architect and lead engineer
Year
2025
Status
live
AILLMautomationfull-stackdata

// AI capabilities

  • Anthropic Claude SDK + provider-agnostic LLM abstraction
  • Structured-output prompting for enrichment fields
  • Confidence scoring per proposed value
  • Source URL attribution for every enriched field
  • Web search and scraping integration (SerpAPI / ScrapingBee)
  • BullMQ + Redis job orchestration for AI pipelines

// Architecture flow

Overview

A queue-driven enrichment dashboard that ingests CSV/Excel product files, looks them up against an internal nine-domain classification scheme, searches the brand web, scrapes structured specs, and uses an LLM to propose SEO-ready titles, descriptions, and HTML blocks. Every proposal lands in a review queue with a confidence score and a source URL. The system never silently overwrites master data.

Problem

Mike Sport's multi-brand catalog had thousands of products with sparse, inconsistent attributes spanning Adidas, Asics, Nike, Puma, and dozens of other brands. Pure-LLM enrichment was tempting but dangerous: a single hallucination could pollute master data and propagate to every downstream system. Manual entry was untenable. The team needed AI scale with human governance.

Approach

Treat the LLM as a research assistant, not a writer. Every enrichment becomes a proposal with provenance. The reviewer sees the source URL, confidence, and the LLM's reasoning, and approves or rejects per field. Provider-agnostic abstraction lets the model be swapped (Anthropic by default, OpenAI optional) without any change to callers.

Architecture

  • Frontend: Next.js 14 + Tailwind. Excel/CSV upload, column auto-detection, inline editing, bulk approvals, audit trail.
  • Backend: Express + TypeScript with Prisma ORM and PostgreSQL.
  • Queue: BullMQ on Redis. Each enrichment task is a job with status, retries, and confidence-based routing.
  • AI provider: Pluggable. seoService.ts exposes a single interface; concrete classes implement Anthropic and OpenAI backends.
  • Search and scrape: SerpAPI / Serper for brand web search, ScrapingBee for structured spec extraction. Both pluggable.
  • Reference repository: A 9-domain classification index (Division, Category, Product Group, Family, Brand, Gender, Season, Country of Origin, HS Code) constrains the search space and gives every product a canonical place.

Tech stack

  • Frontend: Next.js 14, React, TypeScript, Tailwind CSS
  • Backend: Express, TypeScript, Prisma, PostgreSQL
  • Queue: BullMQ, Redis
  • AI: @anthropic-ai/sdk (default), OpenAI SDK (swappable)
  • External: SerpAPI / Serper, ScrapingBee, custom web scraper

AI work

  • Structured-output enrichment service with field-level confidence scores.
  • Provider-agnostic adapter so models are swappable without changing callers.
  • Audit trail linking every enriched value to a source URL the reviewer can click.
  • Governance-first design: AI never overwrites verified master data. It always proposes; humans confirm.
  • Prompts versioned in the repo and tested against a sample fixture set.

Engineering highlights

  • Column auto-detection: drop in a vendor's spreadsheet and the system maps their column names to the canonical schema using fuzzy matching plus an LLM tiebreaker.
  • Confidence-gated review: high-confidence enrichments stream into a fast-approve view; low-confidence go to a deeper review queue.
  • Bulk operations: approve, reject, or override hundreds of proposals at once.
  • Export pipeline: enriched workbook + audit tabs, ready for downstream catalog systems.
  • Retry-aware queue: BullMQ handles rate limits gracefully across SerpAPI, ScrapingBee, and the LLM.

Outcome

Operational across multiple brands. Confidence-gated review reduces manual effort dramatically while keeping the brand's data team in control. The "propose, never overwrite" pattern has become the team's default for any AI-assisted operation on master data.

Lessons

  • The most valuable thing an LLM can produce in a data pipeline is a proposal with provenance, not an answer.
  • BullMQ + Redis is overkill for a hundred items but exactly right for a thousand. The queue is the contract between the AI and the reviewers.
  • The hardest part wasn't the model. It was building a review UX that a non-engineer can trust at speed.

Want to dig deeper?

Ask my AI agent anything about how this was built, what tradeoffs I made, or how it could fit your team.

Ask my AI →