Hypothesis

Property-based testing library for Python that automatically generates hundreds of diverse test inputs — including edge cases humans miss — by describing the shape of valid inputs rather than writing specific examples, then shrinks failing cases to minimal reproducible examples.

Evaluated Mar 06, 2026 (0d ago) vcurrent
Homepage ↗ Repo ↗ Developer Tools python testing property-based-testing fuzzing pytest edge-cases generative-testing
⚙ Agent Friendliness
70
/ 100
Can an agent use this?
🔒 Security
89
/ 100
Is it safe for agents?
⚡ Reliability
89
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
90
Error Messages
88
Auth Simplicity
100
Rate Limits
100

🔒 Security

TLS Enforcement
90
Auth Strength
90
Scope Granularity
85
Dep. Hygiene
88
Secret Handling
90

No network layer; .hypothesis/ database stores found examples on disk — ensure it is excluded from version control if test inputs could contain sensitive data

⚡ Reliability

Uptime/SLA
90
Version Stability
90
Breaking Changes
87
Error Recovery
88
AF Security Reliability

Best When

You can describe properties or invariants that must hold for all valid inputs and want automated discovery of edge cases that unit tests with manually chosen examples would miss.

Avoid When

Your test cases inherently depend on specific hard-coded values, or the function under test has side effects that are expensive, irreversible, or stateful across calls.

Use Cases

  • Finding edge-case bugs in parsing, serialization, and data transformation functions by generating thousands of varied inputs automatically
  • Testing mathematical properties (commutativity, associativity, round-trip encoding) that must hold for all valid inputs
  • Stress-testing REST API handlers or database queries with randomly generated valid inputs to find crashes and constraint violations
  • Replacing large tables of manually written pytest parametrize examples with a single @given strategy that covers more cases
  • Verifying that a refactored function produces identical output to the original for all inputs (differential testing / oracle testing)

Not For

  • Testing behavior that depends on specific known inputs — use pytest with explicit parametrize for that
  • Integration tests that require real external services with specific state — Hypothesis is designed for pure or mockable functions
  • Performance benchmarking — the overhead of input generation and shrinking makes timing unreliable

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

Local Python library — no authentication required; Hypothesis Enterprise (optional paid service) adds a CI database for example sharing

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

MPL 2.0 license for the core library; Hypothesis Enterprise is an optional paid CI service for teams; core library is completely free

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • Functions under test must be side-effect free or reset state between calls — Hypothesis calls the same function many times and accumulated state from prior calls causes misleading failures
  • Hypothesis stores found examples in a local .hypothesis/ database directory — CI pipelines that delete this directory lose the benefit of example replay and may miss previously found bugs
  • The @settings(max_examples=) default (100) is often too low to find rare edge cases in complex functions; increase to 1000+ for thorough testing but expect slower test runs
  • Strategies must accurately reflect valid input constraints — if the strategy generates values the function is not designed to handle, Hypothesis will find false failures that are actually invalid test setups
  • Hypothesis does not know about external dependencies — if a test calls an external API or database, Hypothesis will make that call hundreds of times; always mock external calls in property-based tests

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Hypothesis.

$99

Scores are editorial opinions as of 2026-03-06.

5208
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered