HoloClean

Statistical inference engine that imputes, cleans, and enriches data using machine learning. Leverages weakly supervised learning signals including quality rules, value correlations, and reference data to handle noisy, incomplete, and erroneous datasets. Built on PyTorch and PostgreSQL. Academic project from the University of Wisconsin-Madison.

Evaluated Mar 08, 2026 (0d ago) vcurrent
Homepage ↗ Repo ↗ Data Processing data-cleaning data-quality machine-learning pytorch postgresql inference-engine data-enrichment imputation
⚙ Agent Friendliness
45
/ 100
Can an agent use this?
🔒 Security
38
/ 100
Is it safe for agents?
⚡ Reliability
26
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
0
Documentation
55
Error Messages
40
Auth Simplicity
85
Rate Limits
70

🔒 Security

TLS Enforcement
40
Auth Strength
40
Scope Granularity
30
Dep. Hygiene
35
Secret Handling
45

Unmaintained since July 2023. Dependencies likely outdated with potential CVEs. PostgreSQL credentials via standard config. No network API. Academic project without security hardening.

⚡ Reliability

Uptime/SLA
20
Version Stability
30
Breaking Changes
25
Error Recovery
30
AF Security Reliability

Best When

You have structured tabular data with quality issues and want to apply statistical inference for cleaning and imputation, particularly in a research or experimental context.

Avoid When

You need real-time data cleaning, agent-friendly APIs, or a production-hardened tool with active maintenance (last commit July 2023).

Use Cases

  • Data quality improvement for noisy or incomplete datasets
  • Automated data imputation using probabilistic models
  • Data enrichment using quality rules and correlations
  • Research and academic work on data cleaning approaches

Not For

  • Real-time stream data cleaning (batch processing oriented)
  • Agent/MCP integration (no MCP server, no API)
  • Production workloads without significant engineering effort (academic research tool)
  • Modern Python environments (supports Python 2.7/3.6/3.7, may have compatibility issues with 3.10+)

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
No
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No authentication. Python library run locally against a PostgreSQL database.

Pricing

Model: free
Free tier: Yes
Requires CC: No

Open-source under Apache-2.0 license.

Agent Metadata

Idempotent
Unknown
Retry Guidance
Not documented

Known Gotchas

  • No MCP server, no API — Python library only, not agent-accessible
  • Last commit July 2023 — effectively unmaintained for 2.5+ years
  • Requires PostgreSQL 9.4+ setup with specific schema ownership configuration
  • Python version support (2.7/3.6/3.7) suggests compatibility issues with modern Python
  • macOS requires XCode developer tools for installation
  • Academic research project — not designed for production workloads
  • Docker PostgreSQL setup can conflict with existing PostgreSQL installations

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for HoloClean.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-08.

6961
Packages Evaluated
25669
Need Evaluation
173
Need Re-evaluation
Community Powered