Vapi AI Voice Agent API
Provides a programmable platform for building AI phone agents with configurable LLM, TTS, and STT providers, real-time WebSocket call events, and function calling to external APIs during live calls.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
No per-key scoping — a single Private Key has full account access. Public/Private key separation prevents client-side key exposure but server key compromise is high impact. SOC 2 Type II reported.
⚡ Reliability
Best When
You need a fully managed AI phone agent stack with provider flexibility (choose your own LLM, TTS, STT) and want to avoid stitching together Twilio + OpenAI + ElevenLabs yourself.
Avoid When
Your use case requires extremely custom telephony routing (SIP trunking, complex PBX integration) or you need to run the entire voice stack on-premises.
Use Cases
- • AI customer support agents that handle inbound calls, answer questions, and escalate to humans
- • Outbound appointment reminder and scheduling agents that call patients or customers
- • Lead qualification agents that call prospects and route hot leads to sales reps
- • Voice-driven order status and tracking agents integrated with e-commerce backends
- • 24/7 AI receptionist that books appointments via calendar API function calls during the call
Not For
- • Simple IVR/DTMF touch-tone phone menus without AI
- • Batch audio transcription of pre-recorded files (use Whisper or Deepgram directly)
- • Text-based chatbots or messaging workflows with no voice component
Interface
Authentication
Single Bearer token (Private Key) used for all server-side API calls. A Public Key is used client-side in the Web/Mobile SDK. Keys are managed in the Vapi dashboard.
Pricing
Costs stack: Vapi platform fee + LLM provider cost + TTS provider cost + STT provider cost. Bring-your-own-keys for LLM/TTS/STT can reduce costs significantly.
Agent Metadata
Known Gotchas
- ⚠ Function call (tool call) responses must be returned within a tight timeout or the assistant will continue speaking; agents need to respond to tool call webhooks quickly
- ⚠ LLM provider latency directly impacts turn-taking latency — choosing a slow LLM degrades perceived responsiveness even though Vapi's platform latency is low
- ⚠ Phone number provisioning (buying/assigning numbers) is a separate async step; agents that try to initiate calls immediately after provisioning may race against number readiness
- ⚠ WebSocket server events for real-time call monitoring are separate from REST webhooks; agents integrating both need to handle duplicate event delivery carefully
- ⚠ Transcripts are finalized after call end, not streamed with guaranteed accuracy; mid-call function calling uses interim transcripts that may have ASR errors affecting tool argument extraction
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Vapi AI Voice Agent API.
Scores are editorial opinions as of 2026-03-06.