statsmodels
Python library for statistical modeling and econometrics that provides OLS regression, GLMs, logistic regression, time series models (ARIMA, VAR, SARIMAX), and a comprehensive suite of hypothesis tests with R-style model summaries.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
No network layer; security surface limited to local file I/O for model persistence
⚡ Reliability
Best When
You need rigorous statistical inference with p-values, confidence intervals, and diagnostic tests rather than pure predictive accuracy, especially for econometric or academic research.
Avoid When
Your goal is maximizing predictive accuracy on held-out data rather than statistical inference — use scikit-learn or XGBoost instead.
Use Cases
- • Fitting OLS regression with full statistical output (coefficients, p-values, confidence intervals, R-squared) in a .summary() table
- • Building ARIMA and SARIMAX time series models for forecasting with automatic order selection
- • Running hypothesis tests (t-tests, chi-square, Granger causality, cointegration, heteroskedasticity) on data
- • Estimating generalized linear models (Poisson, negative binomial, logit, probit) with link functions for count or binary outcomes
- • Analyzing panel data with fixed and random effects models for econometric research
Not For
- • Prediction-focused machine learning pipelines — scikit-learn's fit/predict API is better suited for that workflow
- • Deep learning or neural network models
- • Real-time streaming statistical analysis at high throughput
Interface
Authentication
Local Python library — no authentication required
Pricing
BSD 3-Clause license; completely free and open source
Agent Metadata
Known Gotchas
- ⚠ statsmodels API is deliberately different from scikit-learn — it uses fit() returning a Results object, not a fitted estimator, so sklearn Pipeline is not directly compatible
- ⚠ Formula API (smf.ols('y ~ x', data=df)) and array API (sm.OLS(y, X)) behave differently — the formula API adds an intercept automatically while the array API does not; forgetting sm.add_constant() in the array API produces silent wrong results
- ⚠ Convergence warnings from MLE optimization (logit, ARIMA) do not raise exceptions — agents must check result.mle_retvals or inspect warnings to detect failed convergence
- ⚠ ARIMA order selection is not automatic by default — agents must specify (p, d, q) order explicitly or use auto_arima from pmdarima as a wrapper
- ⚠ The .summary() output is a human-readable text/HTML object designed for display, not a machine-readable dict — use result.params, result.pvalues, result.conf_int() to extract values programmatically
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for statsmodels.
Scores are editorial opinions as of 2026-03-06.