Hypothesis

Property-based testing library for Python — generates test inputs automatically based on strategies to find edge cases that example-based tests miss. Hypothesis features: @given() decorator with strategies (st.text(), st.integers(), st.lists(), st.from_type()), automatic shrinking to minimal failing example, database of failure cases between runs, Hypothesis profiles (CI vs local), assume() for preconditions, st.composite() for custom strategies, deadline control, and pytest integration. @given(st.text()) def test_agent_name_validation(name): passes thousands of generated strings to find agent validation edge cases. Finds: empty strings, Unicode, huge inputs, boundary values automatically.

Evaluated Mar 06, 2026 (0d ago) v6.x
Homepage ↗ Repo ↗ Developer Tools python testing hypothesis property-based fuzzing pytest quickcheck
⚙ Agent Friendliness
70
/ 100
Can an agent use this?
🔒 Security
94
/ 100
Is it safe for agents?
⚡ Reliability
89
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
90
Error Messages
92
Auth Simplicity
98
Rate Limits
98

🔒 Security

TLS Enforcement
95
Auth Strength
95
Scope Granularity
92
Dep. Hygiene
90
Secret Handling
95

Testing library — no production security concerns. Hypothesis is excellent for finding security-relevant edge cases (SQL injection patterns, parser crashes, integer overflow) in agent validation code. Failure database in .hypothesis/ may contain sensitive agent test inputs; add to .gitignore for agent repos with sensitive data in test fixtures.

⚡ Reliability

Uptime/SLA
92
Version Stability
88
Breaking Changes
85
Error Recovery
92
AF Security Reliability

Best When

You want to find edge cases in agent parsing, validation, and transformation logic that are hard to enumerate manually — Hypothesis automatically generates boundary values and shrinks failures to minimal reproducing examples.

Avoid When

Your tests involve external services with rate limits (Hypothesis makes many calls), UI testing, or performance-sensitive benchmarks.

Use Cases

  • Agent input validation fuzzing — @given(st.text()) def test_validate_agent_name(name): result = validate_agent_name(name); assert isinstance(result, bool) runs validation against thousands of text inputs including emoji, null bytes, extremely long strings that manual tests miss
  • Agent serialization roundtrip — @given(st.from_type(AgentConfig)) def test_config_roundtrip(config): serialized = config.to_json(); assert AgentConfig.from_json(serialized) == config ensures all agent config values survive JSON serialization for arbitrary inputs
  • Agent pagination boundary conditions — @given(st.integers(min_value=0, max_value=10000), st.integers(min_value=1, max_value=100)) def test_pagination(page, per_page): results = get_agents(page=page, per_page=per_page); assert len(results) <= per_page finds off-by-one bugs
  • Agent state machine testing — @initialize + @rule with stateful Hypothesis RuleBasedStateMachine models agent workflow state transitions; finds impossible state sequences that property-based random inputs wouldn't explore systematically
  • Agent LLM response parser — @given(st.text(alphabet=st.characters(max_codepoint=127))) def test_parse_agent_response(text): parse_llm_response(text) should not raise exception; Hypothesis finds parser crash inputs faster than manual fuzzing

Not For

  • Testing external API behavior — Hypothesis generates inputs for your code; not for testing how LLM APIs respond to inputs; mock LLM calls in hypothesis tests
  • Performance benchmarks — Hypothesis runs many iterations which distorts performance measurements; use pytest-benchmark or timeit for agent performance profiling
  • UI/E2E testing — Hypothesis is unit/integration testing focused; for agent UI property tests use Playwright with custom generators

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — local testing library.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Hypothesis is MPL-2.0 licensed. HypothesisWorks also offers Hypothesis Enterprise for additional features. Core library is free.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • assume() overuse causes Unsatisfied assumption warning — assume(len(name) > 0) inside @given test filters inputs; too many rejections cause Hypothesis to give up with 'Could not find valid data' or 'filter_too_much' health check failure; use st.text(min_size=1) to generate valid inputs instead of assume() for agent input constraints
  • Hypothesis database persists failures between runs — failed agent tests store examples in .hypothesis/examples/; reruns always test previous failures first; delete .hypothesis/ if test logic changed and old failures are no longer valid; CI caching .hypothesis/ makes past failures replay on unchanged code
  • st.from_type() requires type annotations — @given(st.from_type(AgentConfig)) requires AgentConfig to have complete type annotations and be resolvable by Hypothesis; missing or complex type annotations cause InvalidArgument exception; manually construct st.builds(AgentConfig, name=st.text()) for complex agent types
  • Deadline health check fails slow agent tests — Hypothesis imposes 200ms deadline per example by default; agent tests involving actual DB calls or network mocks may exceed deadline; use @settings(deadline=None) to disable or @settings(deadline=timedelta(seconds=5)) for agent integration tests with real operations
  • Stateful testing requires explicit invariant assertions — RuleBasedStateMachine finds state transitions but only reports failures when invariant() method raises AssertionError; agent state machine tests without invariant checks pass even with invalid state — add @invariant() methods to check agent state consistency after each rule
  • st.composite() functions must use draw — @composite def agent_strategy(draw): name = draw(st.text()); return AgentConfig(name=name) requires draw() for each sub-strategy; directly calling st.text() without draw() inside composite gives non-lazy value causing incorrect test behavior for agent composite strategies

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Hypothesis.

$99

Scores are editorial opinions as of 2026-03-06.

5208
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered