We ran a common e-commerce implementation against the handmade goods marketplace spec. The verification engine found one implementation gap before a single line shipped to users. Here's the full breakdown.
Spec: 6 functional requirements, 4 non-functional requirements
Implementation submitted: Python backend with repository and service layer
Verdict: ❌ FAILED — 5/6 requirements pass, 1 gap found
The verdict is FAILED because all requirements must pass. 83% is not done.
The architecture engine expected three path segments in the implementation: api/, models/, and migrations/. All three were present.
✅ Structural check passed.
This is where the gap was found. Layer 2 searches for specific identifiers — class names, function names — in the stripped source code. Comments, strings, and docstrings are removed before matching so signals in dead code don't produce false positives.
Signals required: ProductRepository, get_product_details
Result: Both found. Pass.
Signals required: ReviewService, create_review
Result: Neither found.
Requirement: The spec says buyers must be able to leave reviews for purchased products. The acceptance criterion is specific: POST /api/reviews with {order_id, rating, comment} returns HTTP 201 with {review_id}.
What was submitted: The implementation had product listings, search, seller dashboards, and order history. No review functionality.
Gap: This is a common pattern. Reviews feel like a "Phase 2" feature during development. They get deprioritized. The spec said they were required. The implementation shipped without them. Without verification, this gap surfaces in production when a buyer tries to leave a review and finds no way to do it.
Production consequence: Seller trust is built on reviews in a marketplace. Shipping without them isn't a minor omission — it's a missing core feature that affects buyer confidence and seller acquisition.
Signals required: SellerRepository, get_seller_products
Result: Both found. Pass.
Signals required: OrderRepository, get_seller_orders
Result: Both found. Pass.
Signals required: ProductSearchService, search_products
Result: Both found. Pass.
Signals required: CategoryRepository, get_category_products
Result: Both found. Pass.
Layer 3 is an LLM-based semantic review of the code diff against the spec. It is advisory only — it never affects the pass/fail verdict. Deterministic layers 1 and 2 exclusively gate the result.
In this run, Layer 3 confirmed the Layer 2 finding: review functionality was absent from the diff. It also flagged that the implementation lacked explicit error handling on the order history endpoint for the case where a seller has no orders — a gap the spec's acceptance criteria covers but Layer 2 can't catch without a running server.
| Layer | Result | What it checked |
|---|---|---|
| Layer 1 — Structure | ✅ Pass | api/, models/, migrations/ present |
| Layer 2 — Coverage | ❌ Fail | ReviewService, create_review not found |
| Layer 3 — Semantic | Advisory | Confirmed FR-02 gap, flagged error handling |
| Verdict | ❌ FAILED | 5/6 requirements (83%) |
One requirement. One missing feature. Caught before it shipped.
The verification engine didn't find a bug. It found a missing feature — something the spec required that the implementation never built. That's a different category of problem. Bugs get caught in testing. Missing features get caught in production, by users, after launch.
83% coverage sounds close. In a marketplace, the missing 17% is the review system — the feature that determines whether buyers trust sellers enough to buy from them.
A verification engine that passes everything is not a verification engine.
View the full spec →
View the architecture decisions →
Run verification on your own implementation →