AWS Polly API

AWS Polly converts text to lifelike speech with 60+ voices across 30+ languages using standard and neural TTS engines — outputs MP3, OGG, or PCM audio in synchronous or asynchronous mode.

Evaluated Mar 06, 2026 (0d ago) vcurrent
Homepage ↗ Repo ↗ AI & Machine Learning aws polly tts text-to-speech speech voice neural-tts
⚙ Agent Friendliness
61
/ 100
Can an agent use this?
🔒 Security
92
/ 100
Is it safe for agents?
⚡ Reliability
91
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
88
Error Messages
82
Auth Simplicity
70
Rate Limits
82

🔒 Security

TLS Enforcement
100
Auth Strength
92
Scope Granularity
85
Dep. Hygiene
92
Secret Handling
90

IAM role-based access. No PII stored by Polly — text is not retained after synthesis. TLS in transit. HIPAA eligible (spoken health content requires BAA with AWS). FedRAMP authorized.

⚡ Reliability

Uptime/SLA
92
Version Stability
92
Breaking Changes
92
Error Recovery
88
AF Security Reliability

Best When

You're already on AWS and need reliable, affordable TTS with SSML support, good language coverage, and async large-batch audio generation.

Avoid When

You need ultra-realistic voice cloning, real-time streaming TTS <100ms latency, or are not on AWS infrastructure.

Use Cases

  • Generating audio narration for agent-produced reports and content
  • Text-to-speech for IVR (interactive voice response) systems integrated with Amazon Connect
  • Creating voice responses for agent chatbots deployed via Alexa or Lex
  • Batch audio generation for e-learning content from text-based course materials
  • Real-time speech synthesis for accessibility features in agent-powered applications

Not For

  • Ultra-low latency TTS for real-time conversational AI (Cartesia or ElevenLabs are faster)
  • Voice cloning or custom voice training (Polly uses fixed voices, no custom models)
  • Teams not on AWS who need simpler API access (OpenAI TTS or ElevenLabs are easier)

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: aws_iam
OAuth: No Scopes: Yes

AWS SigV4 signing. IAM policies control SynthesizeSpeech and StartSpeechSynthesisTask actions. Async jobs write to S3 — requires S3 write permissions in addition to Polly permissions.

Pricing

Model: pay-as-you-go
Free tier: Yes
Requires CC: Yes

Character-based pricing. Neural TTS is 4x more expensive but significantly better quality. Long Form neural voices have separate pricing.

Agent Metadata

Pagination
page_token
Idempotent
Partial
Retry Guidance
Documented

Known Gotchas

  • SynthesizeSpeech input limit is 3,000 characters (billed characters) — agents must split long text with natural break points
  • SSML tags count toward character limit but differently than plain text — be precise about billing when mixing SSML and text
  • Async StartSpeechSynthesisTask saves to S3 — agents need both Polly and S3 write permissions, and must poll for completion
  • Neural voices are region-specific — some Neural voices are not available in all AWS regions
  • Response audio is in the HTTP response body as a stream — agents must buffer the entire response before playing or saving

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for AWS Polly API.

$99

Scores are editorial opinions as of 2026-03-06.

5178
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered