Hypothesis
Property-based testing library for Python — generates test inputs automatically based on strategies to find edge cases that example-based tests miss. Hypothesis features: @given() decorator with strategies (st.text(), st.integers(), st.lists(), st.from_type()), automatic shrinking to minimal failing example, database of failure cases between runs, Hypothesis profiles (CI vs local), assume() for preconditions, st.composite() for custom strategies, deadline control, and pytest integration. @given(st.text()) def test_agent_name_validation(name): passes thousands of generated strings to find agent validation edge cases. Finds: empty strings, Unicode, huge inputs, boundary values automatically.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Testing library — no production security concerns. Hypothesis is excellent for finding security-relevant edge cases (SQL injection patterns, parser crashes, integer overflow) in agent validation code. Failure database in .hypothesis/ may contain sensitive agent test inputs; add to .gitignore for agent repos with sensitive data in test fixtures.
⚡ Reliability
Best When
You want to find edge cases in agent parsing, validation, and transformation logic that are hard to enumerate manually — Hypothesis automatically generates boundary values and shrinks failures to minimal reproducing examples.
Avoid When
Your tests involve external services with rate limits (Hypothesis makes many calls), UI testing, or performance-sensitive benchmarks.
Use Cases
- • Agent input validation fuzzing — @given(st.text()) def test_validate_agent_name(name): result = validate_agent_name(name); assert isinstance(result, bool) runs validation against thousands of text inputs including emoji, null bytes, extremely long strings that manual tests miss
- • Agent serialization roundtrip — @given(st.from_type(AgentConfig)) def test_config_roundtrip(config): serialized = config.to_json(); assert AgentConfig.from_json(serialized) == config ensures all agent config values survive JSON serialization for arbitrary inputs
- • Agent pagination boundary conditions — @given(st.integers(min_value=0, max_value=10000), st.integers(min_value=1, max_value=100)) def test_pagination(page, per_page): results = get_agents(page=page, per_page=per_page); assert len(results) <= per_page finds off-by-one bugs
- • Agent state machine testing — @initialize + @rule with stateful Hypothesis RuleBasedStateMachine models agent workflow state transitions; finds impossible state sequences that property-based random inputs wouldn't explore systematically
- • Agent LLM response parser — @given(st.text(alphabet=st.characters(max_codepoint=127))) def test_parse_agent_response(text): parse_llm_response(text) should not raise exception; Hypothesis finds parser crash inputs faster than manual fuzzing
Not For
- • Testing external API behavior — Hypothesis generates inputs for your code; not for testing how LLM APIs respond to inputs; mock LLM calls in hypothesis tests
- • Performance benchmarks — Hypothesis runs many iterations which distorts performance measurements; use pytest-benchmark or timeit for agent performance profiling
- • UI/E2E testing — Hypothesis is unit/integration testing focused; for agent UI property tests use Playwright with custom generators
Interface
Authentication
No auth — local testing library.
Pricing
Hypothesis is MPL-2.0 licensed. HypothesisWorks also offers Hypothesis Enterprise for additional features. Core library is free.
Agent Metadata
Known Gotchas
- ⚠ assume() overuse causes Unsatisfied assumption warning — assume(len(name) > 0) inside @given test filters inputs; too many rejections cause Hypothesis to give up with 'Could not find valid data' or 'filter_too_much' health check failure; use st.text(min_size=1) to generate valid inputs instead of assume() for agent input constraints
- ⚠ Hypothesis database persists failures between runs — failed agent tests store examples in .hypothesis/examples/; reruns always test previous failures first; delete .hypothesis/ if test logic changed and old failures are no longer valid; CI caching .hypothesis/ makes past failures replay on unchanged code
- ⚠ st.from_type() requires type annotations — @given(st.from_type(AgentConfig)) requires AgentConfig to have complete type annotations and be resolvable by Hypothesis; missing or complex type annotations cause InvalidArgument exception; manually construct st.builds(AgentConfig, name=st.text()) for complex agent types
- ⚠ Deadline health check fails slow agent tests — Hypothesis imposes 200ms deadline per example by default; agent tests involving actual DB calls or network mocks may exceed deadline; use @settings(deadline=None) to disable or @settings(deadline=timedelta(seconds=5)) for agent integration tests with real operations
- ⚠ Stateful testing requires explicit invariant assertions — RuleBasedStateMachine finds state transitions but only reports failures when invariant() method raises AssertionError; agent state machine tests without invariant checks pass even with invalid state — add @invariant() methods to check agent state consistency after each rule
- ⚠ st.composite() functions must use draw — @composite def agent_strategy(draw): name = draw(st.text()); return AgentConfig(name=name) requires draw() for each sub-strategy; directly calling st.text() without draw() inside composite gives non-lazy value causing incorrect test behavior for agent composite strategies
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Hypothesis.
Scores are editorial opinions as of 2026-03-06.