The Problem With Most WhatsApp Bots
Most WhatsApp bots are a disgrace. They’re a giant if/else tree disguised as a chatbot, running on a Zapier flow, responding to keywords with canned replies. A user types “price” and the bot replies with a PDF. That’s not an agent. That’s a vending machine.
A real WhatsApp AI agent does three things:
- Understands intent — not just keywords
- Maintains context — across a full conversation, not just the last message
- Takes action — books a call, qualifies a lead, sends a document, escalates to a human
This article breaks down the exact architecture we use at TuVendedor24 to handle 10,000+ daily conversations.
The Stack
- WhatsApp Business API (via Meta Cloud API, not third-party wrappers)
- Node.js for the webhook server and orchestration layer
- OpenAI GPT-4o for intent classification and response generation
- Redis for conversation state persistence
- PostgreSQL for lead storage and CRM sync
Architecture: Three-Layer Design
Layer 1 — Webhook Receiver
Every WhatsApp message hits a webhook endpoint. This layer does one thing: validate the signature, parse the payload, and push the message to a processing queue. Nothing else.
app.post('/webhook', async (req, res) => {
const isValid = verifyWebhookSignature(req);
if (!isValid) return res.sendStatus(403);
const message = parseWhatsAppPayload(req.body);
await messageQueue.push(message);
res.sendStatus(200);
});We return 200 immediately. Meta requires a fast acknowledgment — if you process synchronously and the AI call takes 3 seconds, Meta retries and you get duplicate messages.
Layer 2 — Conversation Manager
This is where context lives. Every phone number gets a conversation object stored in Redis with a 24-hour TTL (matching WhatsApp’s session window):
interface Conversation {
phoneNumber: string;
stage: 'greeting' | 'qualification' | 'scheduling' | 'handoff';
collectedData: {
name?: string;
company?: string;
budget?: string;
timeline?: string;
};
messageHistory: Message[];
lastUpdated: Date;
}The stage machine is the most important piece. It’s what separates a bot from an agent — the system knows WHERE in the sales process the lead is, and routes the AI’s instructions accordingly.
Layer 3 — Agent Router
This is the multi-agent layer. Based on conversation.stage, the router hands the message to a specialized agent:
- GreetingAgent — first contact, collects name and intent
- QualificationAgent — BANT qualification (Budget, Authority, Need, Timeline)
- SchedulingAgent — books a call via Google Calendar API
- HandoffAgent — creates a CRM record and notifies the sales rep
Each agent has its own system prompt, its own tools, and its own success criteria. They don’t share state — they read from and write to the conversation object.
Handling Context Without Overloading the LLM
The biggest mistake I see is passing the full message history to GPT on every request. At 100 messages per conversation, you’re burning tokens and adding latency.
Instead, we summarize:
async function buildAgentContext(conversation: Conversation): Promise<string> {
const recentMessages = conversation.messageHistory.slice(-6);
if (conversation.messageHistory.length > 6) {
const summary = await summarizeConversation(conversation.messageHistory.slice(0, -6));
return `[Summary of earlier conversation]: ${summary}\n\n[Recent messages]: ...`;
}
return formatMessages(recentMessages);
}This keeps the context window tight and response times under 1.5 seconds p95.
The Handoff Trigger
The hardest part is knowing when to hand off to a human. We use a confidence score:
- If the AI’s response confidence is below 0.7 AND the lead has been qualified → trigger handoff
- If the lead explicitly asks for a human → trigger handoff immediately
- If 3 consecutive messages go unresolved → trigger handoff
if (response.confidence < 0.7 && conversation.stage === 'qualification') {
await triggerHumanHandoff(conversation);
return "Let me connect you with one of our team members right now.";
}Results
Since deploying this architecture at TuVendedor24:
- 87% of leads are fully qualified without human intervention
- Average conversation length: 8 messages before a call is booked
- Response time: <2 seconds p95 across 10K+ daily conversations
- Handoff rate: 13% — and those are the high-intent ones that close faster
What I’d Do Differently
The one thing I’d change: use a proper agent framework (like LangGraph or a custom state machine) earlier instead of building the routing logic from scratch. We refactored it at month 3 and saved 40% on complexity.
If you’re building a WhatsApp agent and want to talk architecture, book a call.
