Product Data Enrichment Dashboard
AI-assisted product enrichment pipeline with confidence scoring, source-tracked LLM proposals, and a queue-based architecture that never silently overwrites master data.
// AI capabilities
- Anthropic Claude SDK + provider-agnostic LLM abstraction
- Structured-output prompting for enrichment fields
- Confidence scoring per proposed value
- Source URL attribution for every enriched field
- Web search and scraping integration (SerpAPI / ScrapingBee)
- BullMQ + Redis job orchestration for AI pipelines
// Architecture flow
Overview
A queue-driven enrichment dashboard that ingests CSV/Excel product files, looks them up against an internal nine-domain classification scheme, searches the brand web, scrapes structured specs, and uses an LLM to propose SEO-ready titles, descriptions, and HTML blocks. Every proposal lands in a review queue with a confidence score and a source URL. The system never silently overwrites master data.
Problem
Mike Sport's multi-brand catalog had thousands of products with sparse, inconsistent attributes spanning Adidas, Asics, Nike, Puma, and dozens of other brands. Pure-LLM enrichment was tempting but dangerous: a single hallucination could pollute master data and propagate to every downstream system. Manual entry was untenable. The team needed AI scale with human governance.
Approach
Treat the LLM as a research assistant, not a writer. Every enrichment becomes a proposal with provenance. The reviewer sees the source URL, confidence, and the LLM's reasoning, and approves or rejects per field. Provider-agnostic abstraction lets the model be swapped (Anthropic by default, OpenAI optional) without any change to callers.
Architecture
- Frontend: Next.js 14 + Tailwind. Excel/CSV upload, column auto-detection, inline editing, bulk approvals, audit trail.
- Backend: Express + TypeScript with Prisma ORM and PostgreSQL.
- Queue: BullMQ on Redis. Each enrichment task is a job with status, retries, and confidence-based routing.
- AI provider: Pluggable.
seoService.tsexposes a single interface; concrete classes implement Anthropic and OpenAI backends. - Search and scrape: SerpAPI / Serper for brand web search, ScrapingBee for structured spec extraction. Both pluggable.
- Reference repository: A 9-domain classification index (Division, Category, Product Group, Family, Brand, Gender, Season, Country of Origin, HS Code) constrains the search space and gives every product a canonical place.
Tech stack
- Frontend: Next.js 14, React, TypeScript, Tailwind CSS
- Backend: Express, TypeScript, Prisma, PostgreSQL
- Queue: BullMQ, Redis
- AI:
@anthropic-ai/sdk(default), OpenAI SDK (swappable) - External: SerpAPI / Serper, ScrapingBee, custom web scraper
AI work
- Structured-output enrichment service with field-level confidence scores.
- Provider-agnostic adapter so models are swappable without changing callers.
- Audit trail linking every enriched value to a source URL the reviewer can click.
- Governance-first design: AI never overwrites verified master data. It always proposes; humans confirm.
- Prompts versioned in the repo and tested against a sample fixture set.
Engineering highlights
- Column auto-detection: drop in a vendor's spreadsheet and the system maps their column names to the canonical schema using fuzzy matching plus an LLM tiebreaker.
- Confidence-gated review: high-confidence enrichments stream into a fast-approve view; low-confidence go to a deeper review queue.
- Bulk operations: approve, reject, or override hundreds of proposals at once.
- Export pipeline: enriched workbook + audit tabs, ready for downstream catalog systems.
- Retry-aware queue: BullMQ handles rate limits gracefully across SerpAPI, ScrapingBee, and the LLM.
Outcome
Operational across multiple brands. Confidence-gated review reduces manual effort dramatically while keeping the brand's data team in control. The "propose, never overwrite" pattern has become the team's default for any AI-assisted operation on master data.
Lessons
- The most valuable thing an LLM can produce in a data pipeline is a proposal with provenance, not an answer.
- BullMQ + Redis is overkill for a hundred items but exactly right for a thousand. The queue is the contract between the AI and the reviewers.
- The hardest part wasn't the model. It was building a review UX that a non-engineer can trust at speed.
Want to dig deeper?
Ask my AI agent anything about how this was built, what tradeoffs I made, or how it could fit your team.
Ask my AI →// related projects
Linc Consulting Lead App
Lead management app that uses the Anthropic Claude SDK to qualify, score, and route incoming consulting leads.
AI SEO Collection Optimizer
Autonomous SEO content engine that captures Lebanese organic search demand by generating high-confidence collection landing pages on a parallel VPS layer, grounded in Search Console signals, Shopify orders, and live catalog data, with a self-improving GSC measurement loop.
Marketing Intelligence Dashboard
Enterprise marketing analytics platform with real-time dashboards, OpenAI-powered insights via the Vercel AI SDK, and one-click PPTX stakeholder reporting.