{"id":"exo-explore-exo","name":"exo","homepage":null,"repo_url":"https://github.com/exo-explore/exo","category":"ai-ml","subcategories":[],"tags":["ai-ml","llm","inference","distributed-inference","local-first","cluster","rdma","tensor-parallel","api-compatible"],"what_it_does":"exo is a local-first system for running LLM inference across multiple devices by automatically discovering peers and distributing model execution (tensor/pipeline parallelism) over the network, with an optional built-in dashboard and API compatible with common chat/response endpoints. On macOS it also describes RDMA-over-Thunderbolt support for reduced inter-device latency.","use_cases":["Run models larger than a single device can fit by sharding across multiple local machines/devices","Low-latency multi-device inference on compatible macOS + Thunderbolt 5 hardware (RDMA)","Use existing client integrations by speaking OpenAI/Anthropic/Ollama-compatible API formats","Manage a local inference cluster via a built-in dashboard","Run from offline/local models using environment configuration"],"not_for":["Production workloads that require a hosted managed service with SLAs","Environments needing strong, documented authentication/authorization controls for API access","Systems that cannot meet strict hardware/OS requirements for RDMA (when used)"],"best_when":"You have multiple compatible local devices and want to distribute model inference while using the provided localhost API/dashboard; especially effective for macOS clusters with RDMA capability.","avoid_when":"You need robust security controls for a network-exposed API (auth, rate limits, TLS guarantees) but cannot isolate to localhost or a trusted network; also avoid RDMA clusters when device OS versions/hardware connections cannot be kept consistent.","alternatives":["Ollama (single-node/local)","vLLM (single node or multi-GPU with its own deployment patterns)","Tensor/pipe parallel frameworks in PyTorch (manual distributed setup)","Triton Inference Server / other inference serving stacks","Ray-based model serving for multi-node orchestration"],"af_score":18.8,"security_score":21.2,"reliability_score":25.0,"package_type":"skill","discovery_source":["openclaw"],"priority":"high","status":"evaluated","version_evaluated":null,"last_evaluated":"2026-03-29T12:59:04.571137+00:00","interface":{"has_rest_api":true,"has_graphql":false,"has_grpc":false,"has_mcp_server":false,"mcp_server_url":null,"has_sdk":false,"sdk_languages":[],"openapi_spec_url":null,"webhooks":false},"auth":{"methods":["No authentication mechanisms described in provided README content (only mentions localhost API and environment-based configuration)."],"oauth":false,"scopes":false,"notes":"The README content provided does not describe API authentication (API keys, OAuth, session auth) or authorization scopes. It does state the dashboard/API run on localhost:52415, which may imply local-only usage unless users expose it externally themselves."},"pricing":{"model":null,"free_tier_exists":false,"free_tier_limits":null,"paid_tiers":[],"requires_credit_card":false,"estimated_workload_costs":null,"notes":"No pricing information is provided; exo appears to be self-hosted/local-run software."},"requirements":{"requires_signup":false,"requires_credit_card":false,"domain_verification":false,"data_residency":[],"compliance":[],"min_contract":null},"agent_readiness":{"af_score":18.8,"security_score":21.2,"reliability_score":25.0,"mcp_server_quality":null,"documentation_accuracy":null,"error_message_quality":null,"error_message_notes":null,"auth_complexity":20.0,"rate_limit_clarity":0.0,"tls_enforcement":20.0,"auth_strength":10.0,"scope_granularity":10.0,"dependency_hygiene":45.0,"secret_handling":30.0,"security_notes":"From the provided content, the API/dashboard are stated to run on localhost, but there is no explicit discussion of TLS requirements, authentication/authorization, or rate-limiting. The project includes dependencies that integrate with external services (e.g., HuggingFace hub) and uses custom/remote git sources for some components (mlx, mlx-lm), which increases the need for supply-chain review and pinning verification. RDMA operation has strict configuration caveats; misconfigured networking may affect cluster isolation and discoverability.","uptime_documented":0.0,"version_stability":40.0,"breaking_changes_history":30.0,"error_recovery":30.0,"idempotency_support":null,"idempotency_notes":null,"pagination_style":null,"retry_guidance_documented":null,"known_agent_gotchas":["No MCP server is indicated, so agent integrations would rely on the described HTTP API endpoints.","The provided README does not document authentication, authorization, rate limits, pagination, or retry/idempotency semantics; agents should treat these as unknown until verified in the code/docs.","RDMA operation depends on specific macOS version matching (even beta versions) and correct Thunderbolt 5 cabling/port usage; misconfiguration can lead to discovery/connectivity issues."]}}