Why Code Fails Spec Verification — Chat App Case Study (0/6 Requirements)

We ran a common chat implementation against the real-time chat application spec. The verification engine found six missing features before a single line shipped to users. This is the 100% fail case study — what happens when you build a chat app without the spec's required identifiers.


Setup

Spec: 6 functional requirements, 4 non-functional requirements
Implementation submitted: Python backend with REST API and basic message storage
Verdict: ❌ FAILED — 0/6 requirements pass, 6 gaps found

The verdict is FAILED because all requirements must pass. 0% is not done.


Layer 1 — Structural check

The architecture engine expected four path segments in the implementation: api/, models/, websocket/, and services/. Only api/ and models/ were present. websocket/ and services/ were missing.

❌ Structural check failed.


Layer 2 — Requirements coverage

This is where all six gaps were found. Layer 2 searches for specific identifiers — class names, function names — in the stripped source code. Comments, strings, and docstrings are removed before matching so signals in dead code don't produce false positives.

❌ FR-01: Users can send and receive messages in real-time

Signals required: MessageService, send_message, WebSocketHandler
Result: None found.

Requirement: The spec says users must be able to send and receive messages in real-time with p95 latency < 200ms. The acceptance criterion is specific: POST /api/messages returns HTTP 201 and broadcasts the message to all connected clients within 200ms.

What was submitted: The implementation had a REST endpoint for posting messages that wrote to PostgreSQL synchronously. No WebSocket handler. No real-time broadcast. Messages were stored but never pushed to connected clients.

Gap: This is the core feature of a chat application. Without WebSocket infrastructure, there is no real-time messaging. Users would have to poll GET /api/messages every few seconds to see new messages — a pattern that breaks NFR-01 (p95 < 200ms) and NFR-02 (10,000 concurrent users).

Production consequence: This isn't a chat app. It's a message board with a REST API.


❌ FR-02: Users can create and join chat rooms

Signals required: RoomService, create_room, join_room
Result: None found.

Requirement: The spec requires room creation and join functionality. The acceptance criterion is specific: POST /api/rooms returns HTTP 201 with {room_id, room_name}, and POST /api/rooms/{room_id}/join adds the user to the room's participant list.

What was submitted: The implementation had no room concept. All messages were stored in a single global table with no room association.

Gap: Without rooms, there is no way to segment conversations. Every user sees every message. This is a single-channel IRC client, not a multi-room chat application.

Production consequence: At 10,000 concurrent users (NFR-02), a single global message stream becomes unusable. Users cannot create private conversations or topic-specific channels.


❌ FR-03: Users can see who is currently online in a room

Signals required: PresenceService, get_online_users
Result: None found.

Requirement: The spec requires presence tracking. The acceptance criterion is specific: GET /api/rooms/{room_id}/presence returns HTTP 200 with {online_users: [{user_id, username, last_seen}]}.

What was submitted: No presence tracking. No way to tell if a user is online or offline.

Gap: Presence is a core feature of real-time chat. Without it, users don't know if their messages are being seen. This is especially critical in support chat or team collaboration scenarios.

Production consequence: Users will ask "is anyone there?" in messages because there's no other way to know.


❌ FR-04: Users can view message history for a room

Signals required: MessageRepository, get_room_messages
Result: None found.

Requirement: The spec requires message history retrieval. The acceptance criterion is specific: GET /api/rooms/{room_id}/messages returns HTTP 200 with messages ordered by timestamp descending.

What was submitted: The implementation had a GET /api/messages endpoint that returned all messages globally, not scoped to a room.

Gap: Without room-scoped message history, users cannot load past messages for a specific conversation. This breaks the user experience for any chat application where users expect to see previous messages when they join a room.

Production consequence: Users joining a room see no context. They don't know what was discussed before they arrived.


❌ FR-05: Users receive notifications for new messages

Signals required: NotificationService, broadcast_message
Result: None found.

Requirement: The spec requires real-time notifications. The acceptance criterion is specific: when a new message is sent to a room, all connected users receive a WebSocket event with {event: 'new_message', message_id, user_id, content, timestamp} within 200ms.

What was submitted: No notification system. No WebSocket events. No way for users to know a new message arrived without polling.

Gap: This is the same gap as FR-01. Without WebSocket infrastructure, there is no real-time notification system.

Production consequence: Users must refresh the page or poll the API to see new messages. This is not a real-time chat application.


❌ FR-06: Users can search message history by keyword

Signals required: MessageSearchService, search_messages
Result: None found.

Requirement: The spec requires message search. The acceptance criterion is specific: GET /api/rooms/{room_id}/search with {keyword} returns HTTP 200 with matching messages.

What was submitted: No search functionality. No way to find past messages by keyword.

Gap: Search is a secondary feature, but it's explicitly required in the spec. Without it, users cannot find past conversations or reference previous decisions made in chat.

Production consequence: Users will ask the same questions repeatedly because they can't search for answers in message history.


Layer 3 — Semantic audit (advisory)

Layer 3 is an LLM-based semantic review of the code diff against the spec. It is advisory only — it never affects the pass/fail verdict. Deterministic layers 1 and 2 exclusively gate the result.

In this run, Layer 3 confirmed all six Layer 2 findings: WebSocket infrastructure, room management, presence tracking, notifications, and search were all absent from the diff. It also flagged that the implementation lacked any horizontal scaling strategy — a critical gap given NFR-02 (10,000 concurrent users).


Summary

Layer Result What it checked
Layer 1 — Structure ❌ Fail websocket/, services/ missing
Layer 2 — Coverage ❌ Fail 0/6 requirements found
Layer 3 — Semantic Advisory Confirmed all gaps, flagged scaling issues
Verdict ❌ FAILED 0/6 requirements (0%)

Six requirements. Six missing features. Caught before it shipped.


What this means

The verification engine didn't find bugs. It found missing features — things the spec required that the implementation never built. That's a different category of problem. Bugs get caught in testing. Missing features get caught in production, by users, after launch.

0% coverage means the implementation is not a chat application. It's a REST API for storing messages with no real-time functionality, no room management, no presence tracking, and no notifications. The spec asked for a real-time chat application. The implementation delivered a message board.


A verification engine that passes everything is not a verification engine.


View the full spec →
View the architecture decisions →
Check your own architecture risk score →

Run verification on your own implementation →