{"id":"open-compass-opencompass","name":"opencompass","af_score":36.2,"security_score":48.8,"reliability_score":32.5,"what_it_does":"OpenCompass is an open-source LLM evaluation platform. It provides configurable evaluation pipelines (via CLI and Python scripts) to run model benchmarks across many datasets, including support for local/open-source models and API-based models (e.g., OpenAI/Qwen), with optional inference acceleration backends (e.g., vLLM, LMDeploy).","best_when":"You need reproducible offline/batch evaluation of LLMs with configurable datasets and scoring logic, and you are comfortable running Python tooling with model/dataset dependencies.","avoid_when":"You need a simple SaaS-style API with documented HTTP endpoints, or you want turnkey managed hosting without handling datasets, credentials, and compute yourself.","last_evaluated":"2026-03-29T14:56:44.992714+00:00","has_mcp":false,"has_api":false,"auth_methods":["Environment-variable based API key for OpenAI-style API model evaluation (e.g., OPENAI_API_KEY)"],"has_free_tier":false,"known_gotchas":["Primarily CLI/Python batch workflow, so an agent must orchestrate runs rather than call a stable request/response API","Large dependency surface (datasets, models, acceleration backends) can cause environment-specific failures; agent may need careful setup/installation extras","Authentication is backend-specific (e.g., API keys in env vars); agents should avoid logging secrets and ensure the correct environment variables are set","Evaluation results/reproducibility depend heavily on configuration files and dataset/model versions; changes in config structure (noted breaking change around 0.4.0) can break automation"],"error_quality":0.0}