We ran a common chat implementation against the real-time chat application spec. The verification engine found six missing features before a single line shipped to users. This is the 100% fail case study — what happens when you build a chat app without the spec's required identifiers.
Spec: 6 functional requirements, 4 non-functional requirements
Implementation submitted: Python backend with REST API and basic message storage
Verdict: ❌ FAILED — 0/6 requirements pass, 6 gaps found
The verdict is FAILED because all requirements must pass. 0% is not done.
The architecture engine expected four path segments in the implementation: api/, models/, websocket/, and services/. Only api/ and models/ were present. websocket/ and services/ were missing.
❌ Structural check failed.
This is where all six gaps were found. Layer 2 searches for specific identifiers — class names, function names — in the stripped source code. Comments, strings, and docstrings are removed before matching so signals in dead code don't produce false positives.
Signals required: MessageService, send_message, WebSocketHandler
Result: None found.
Requirement: The spec says users must be able to send and receive messages in real-time with p95 latency < 200ms. The acceptance criterion is specific: POST /api/messages returns HTTP 201 and broadcasts the message to all connected clients within 200ms.
What was submitted: The implementation had a REST endpoint for posting messages that wrote to PostgreSQL synchronously. No WebSocket handler. No real-time broadcast. Messages were stored but never pushed to connected clients.
Gap: This is the core feature of a chat application. Without WebSocket infrastructure, there is no real-time messaging. Users would have to poll GET /api/messages every few seconds to see new messages — a pattern that breaks NFR-01 (p95 < 200ms) and NFR-02 (10,000 concurrent users).
Production consequence: This isn't a chat app. It's a message board with a REST API.
Signals required: RoomService, create_room, join_room
Result: None found.
Requirement: The spec requires room creation and join functionality. The acceptance criterion is specific: POST /api/rooms returns HTTP 201 with {room_id, room_name}, and POST /api/rooms/{room_id}/join adds the user to the room's participant list.
What was submitted: The implementation had no room concept. All messages were stored in a single global table with no room association.
Gap: Without rooms, there is no way to segment conversations. Every user sees every message. This is a single-channel IRC client, not a multi-room chat application.
Production consequence: At 10,000 concurrent users (NFR-02), a single global message stream becomes unusable. Users cannot create private conversations or topic-specific channels.
Signals required: PresenceService, get_online_users
Result: None found.
Requirement: The spec requires presence tracking. The acceptance criterion is specific: GET /api/rooms/{room_id}/presence returns HTTP 200 with {online_users: [{user_id, username, last_seen}]}.
What was submitted: No presence tracking. No way to tell if a user is online or offline.
Gap: Presence is a core feature of real-time chat. Without it, users don't know if their messages are being seen. This is especially critical in support chat or team collaboration scenarios.
Production consequence: Users will ask "is anyone there?" in messages because there's no other way to know.
Signals required: MessageRepository, get_room_messages
Result: None found.
Requirement: The spec requires message history retrieval. The acceptance criterion is specific: GET /api/rooms/{room_id}/messages returns HTTP 200 with messages ordered by timestamp descending.
What was submitted: The implementation had a GET /api/messages endpoint that returned all messages globally, not scoped to a room.
Gap: Without room-scoped message history, users cannot load past messages for a specific conversation. This breaks the user experience for any chat application where users expect to see previous messages when they join a room.
Production consequence: Users joining a room see no context. They don't know what was discussed before they arrived.
Signals required: NotificationService, broadcast_message
Result: None found.
Requirement: The spec requires real-time notifications. The acceptance criterion is specific: when a new message is sent to a room, all connected users receive a WebSocket event with {event: 'new_message', message_id, user_id, content, timestamp} within 200ms.
What was submitted: No notification system. No WebSocket events. No way for users to know a new message arrived without polling.
Gap: This is the same gap as FR-01. Without WebSocket infrastructure, there is no real-time notification system.
Production consequence: Users must refresh the page or poll the API to see new messages. This is not a real-time chat application.
Signals required: MessageSearchService, search_messages
Result: None found.
Requirement: The spec requires message search. The acceptance criterion is specific: GET /api/rooms/{room_id}/search with {keyword} returns HTTP 200 with matching messages.
What was submitted: No search functionality. No way to find past messages by keyword.
Gap: Search is a secondary feature, but it's explicitly required in the spec. Without it, users cannot find past conversations or reference previous decisions made in chat.
Production consequence: Users will ask the same questions repeatedly because they can't search for answers in message history.
Layer 3 is an LLM-based semantic review of the code diff against the spec. It is advisory only — it never affects the pass/fail verdict. Deterministic layers 1 and 2 exclusively gate the result.
In this run, Layer 3 confirmed all six Layer 2 findings: WebSocket infrastructure, room management, presence tracking, notifications, and search were all absent from the diff. It also flagged that the implementation lacked any horizontal scaling strategy — a critical gap given NFR-02 (10,000 concurrent users).
| Layer | Result | What it checked |
|---|---|---|
| Layer 1 — Structure | ❌ Fail | websocket/, services/ missing |
| Layer 2 — Coverage | ❌ Fail | 0/6 requirements found |
| Layer 3 — Semantic | Advisory | Confirmed all gaps, flagged scaling issues |
| Verdict | ❌ FAILED | 0/6 requirements (0%) |
Six requirements. Six missing features. Caught before it shipped.
The verification engine didn't find bugs. It found missing features — things the spec required that the implementation never built. That's a different category of problem. Bugs get caught in testing. Missing features get caught in production, by users, after launch.
0% coverage means the implementation is not a chat application. It's a REST API for storing messages with no real-time functionality, no room management, no presence tracking, and no notifications. The spec asked for a real-time chat application. The implementation delivered a message board.
A verification engine that passes everything is not a verification engine.
View the full spec →
View the architecture decisions →
Check your own architecture risk score →