E-commerce Spec Verification Case Study — 5 of 6 Requirements, One Gap Found

We ran a common e-commerce implementation against the handmade goods marketplace spec. The verification engine found one implementation gap before a single line shipped to users. Here's the full breakdown.

Setup

Spec: 6 functional requirements, 4 non-functional requirements
Implementation submitted: Python backend with repository and service layer
Verdict: ❌ FAILED — 5/6 requirements pass, 1 gap found

The verdict is FAILED because all requirements must pass. 83% is not done.

Layer 1 — Structural check

The architecture engine expected three path segments in the implementation: api/, models/, and migrations/. All three were present.

✅ Structural check passed.

Layer 2 — Requirements coverage

This is where the gap was found. Layer 2 searches for specific identifiers — class names, function names — in the stripped source code. Comments, strings, and docstrings are removed before matching so signals in dead code don't produce false positives.

✅ FR-01: Buyers can view product listings

Signals required: ProductRepository, get_product_details
Result: Both found. Pass.

❌ FR-02: Buyers can leave reviews for purchased products

Signals required: ReviewService, create_review
Result: Neither found.

Requirement: The spec says buyers must be able to leave reviews for purchased products. The acceptance criterion is specific: POST /api/reviews with {order_id, rating, comment} returns HTTP 201 with {review_id}.

What was submitted: The implementation had product listings, search, seller dashboards, and order history. No review functionality.

Gap: This is a common pattern. Reviews feel like a "Phase 2" feature during development. They get deprioritized. The spec said they were required. The implementation shipped without them. Without verification, this gap surfaces in production when a buyer tries to leave a review and finds no way to do it.

Production consequence: Seller trust is built on reviews in a marketplace. Shipping without them isn't a minor omission — it's a missing core feature that affects buyer confidence and seller acquisition.

✅ FR-03: Sellers can manage their product listings

Signals required: SellerRepository, get_seller_products
Result: Both found. Pass.

✅ FR-04: Sellers can view their order history

Signals required: OrderRepository, get_seller_orders
Result: Both found. Pass.

✅ FR-05: Buyers can search for products by keyword

Signals required: ProductSearchService, search_products
Result: Both found. Pass.

✅ FR-06: Buyers can filter products by category

Signals required: CategoryRepository, get_category_products
Result: Both found. Pass.

Layer 3 — Semantic audit (advisory)

Layer 3 is an LLM-based semantic review of the code diff against the spec. It is advisory only — it never affects the pass/fail verdict. Deterministic layers 1 and 2 exclusively gate the result.

In this run, Layer 3 confirmed the Layer 2 finding: review functionality was absent from the diff. It also flagged that the implementation lacked explicit error handling on the order history endpoint for the case where a seller has no orders — a gap the spec's acceptance criteria covers but Layer 2 can't catch without a running server.

Summary

Layer	Result	What it checked
Layer 1 — Structure	✅ Pass	`api/`, `models/`, `migrations/` present
Layer 2 — Coverage	❌ Fail	`ReviewService`, `create_review` not found
Layer 3 — Semantic	Advisory	Confirmed FR-02 gap, flagged error handling
Verdict	❌ FAILED	5/6 requirements (83%)

One requirement. One missing feature. Caught before it shipped.

What this means

The verification engine didn't find a bug. It found a missing feature — something the spec required that the implementation never built. That's a different category of problem. Bugs get caught in testing. Missing features get caught in production, by users, after launch.

83% coverage sounds close. In a marketplace, the missing 17% is the review system — the feature that determines whether buyers trust sellers enough to buy from them.

A verification engine that passes everything is not a verification engine.

View the full spec →
View the architecture decisions →
Run verification on your own implementation →