HoloClean
Statistical inference engine that imputes, cleans, and enriches data using machine learning. Leverages weakly supervised learning signals including quality rules, value correlations, and reference data to handle noisy, incomplete, and erroneous datasets. Built on PyTorch and PostgreSQL. Academic project from the University of Wisconsin-Madison.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Unmaintained since July 2023. Dependencies likely outdated with potential CVEs. PostgreSQL credentials via standard config. No network API. Academic project without security hardening.
⚡ Reliability
Best When
You have structured tabular data with quality issues and want to apply statistical inference for cleaning and imputation, particularly in a research or experimental context.
Avoid When
You need real-time data cleaning, agent-friendly APIs, or a production-hardened tool with active maintenance (last commit July 2023).
Use Cases
- • Data quality improvement for noisy or incomplete datasets
- • Automated data imputation using probabilistic models
- • Data enrichment using quality rules and correlations
- • Research and academic work on data cleaning approaches
Not For
- • Real-time stream data cleaning (batch processing oriented)
- • Agent/MCP integration (no MCP server, no API)
- • Production workloads without significant engineering effort (academic research tool)
- • Modern Python environments (supports Python 2.7/3.6/3.7, may have compatibility issues with 3.10+)
Interface
Authentication
No authentication. Python library run locally against a PostgreSQL database.
Pricing
Open-source under Apache-2.0 license.
Agent Metadata
Known Gotchas
- ⚠ No MCP server, no API — Python library only, not agent-accessible
- ⚠ Last commit July 2023 — effectively unmaintained for 2.5+ years
- ⚠ Requires PostgreSQL 9.4+ setup with specific schema ownership configuration
- ⚠ Python version support (2.7/3.6/3.7) suggests compatibility issues with modern Python
- ⚠ macOS requires XCode developer tools for installation
- ⚠ Academic research project — not designed for production workloads
- ⚠ Docker PostgreSQL setup can conflict with existing PostgreSQL installations
Alternatives
Full Evaluation Report
Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for HoloClean.
AI-powered analysis · PDF + markdown · Delivered within 30 minutes
Package Brief
Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.
Delivered within 10 minutes
Score Monitoring
Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.
Continuous monitoring
Scores are editorial opinions as of 2026-03-08.