AWS Polly API

AWS Polly converts text to lifelike speech with 60+ voices across 30+ languages using standard and neural TTS engines — outputs MP3, OGG, or PCM audio in synchronous or asynchronous mode.

Evaluated Mar 06, 2026 (0d ago) vcurrent

Homepage ↗ Repo ↗ AI & Machine Learning aws polly tts text-to-speech speech voice neural-tts

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

100

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

IAM role-based access. No PII stored by Polly — text is not retained after synthesis. TLS in transit. HIPAA eligible (spoken health content requires BAA with AWS). FedRAMP authorized.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You're already on AWS and need reliable, affordable TTS with SSML support, good language coverage, and async large-batch audio generation.

Avoid When

You need ultra-realistic voice cloning, real-time streaming TTS <100ms latency, or are not on AWS infrastructure.

Use Cases

• Generating audio narration for agent-produced reports and content
• Text-to-speech for IVR (interactive voice response) systems integrated with Amazon Connect
• Creating voice responses for agent chatbots deployed via Alexa or Lex
• Batch audio generation for e-learning content from text-based course materials
• Real-time speech synthesis for accessibility features in agent-powered applications

Not For

• Ultra-low latency TTS for real-time conversational AI (Cartesia or ElevenLabs are faster)
• Voice cloning or custom voice training (Polly uses fixed voices, no custom models)
• Teams not on AWS who need simpler API access (OpenAI TTS or ElevenLabs are easier)

Interface

REST API

Yes

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: aws_iam

OAuth: No Scopes: Yes

AWS SigV4 signing. IAM policies control SynthesizeSpeech and StartSpeechSynthesisTask actions. Async jobs write to S3 — requires S3 write permissions in addition to Polly permissions.

Pricing

Model: pay-as-you-go

Free tier: Yes

Requires CC: Yes

Character-based pricing. Neural TTS is 4x more expensive but significantly better quality. Long Form neural voices have separate pricing.

Agent Metadata

Pagination

page_token

Idempotent

Partial

Retry Guidance

Documented

Known Gotchas

⚠ SynthesizeSpeech input limit is 3,000 characters (billed characters) — agents must split long text with natural break points
⚠ SSML tags count toward character limit but differently than plain text — be precise about billing when mixing SSML and text
⚠ Async StartSpeechSynthesisTask saves to S3 — agents need both Polly and S3 write permissions, and must poll for completion
⚠ Neural voices are region-specific — some Neural voices are not available in all AWS regions
⚠ Response audio is in the HTTP response body as a stream — agents must buffer the entire response before playing or saving

Alternatives

openai-tts-api eleven-labs-api cartesia-api google-cloud-speech-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for AWS Polly API.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.