{"id":"bytedance-ui-tars","name":"UI-TARS","homepage":null,"repo_url":"https://github.com/bytedance/UI-TARS","category":"automation","subcategories":[],"tags":["ai-ml","automation","computer-use","gui","multimodal","research"],"what_it_does":"UI-TARS is an open-source multimodal agent for automated GUI interaction. It uses a vision-language model to parse/ground visual observations and generate structured action instructions that can be translated into automation code (e.g., PyAutoGUI) to operate desktop/mobile UIs.","use_cases":["Automating repetitive desktop GUI tasks (clicking, typing, scrolling, navigation)","Research and benchmarking of multimodal “computer use” agents in virtual environments","Browser/desktop automation workflows via the provided action parsing and coordinate processing guidance","Mobile/Android emulator GUI automation (via mobile-specific action templates)","Evaluation/training for grounding (action-only output via the GROUNDING prompt template)"],"not_for":["Production-grade, unattended automation for security-sensitive or permission-gated systems (e.g., bypassing logins/CAPTCHAs)","High-integrity operations without safety controls (financial transfers, account management, irreversible actions)","Use as a general API service (it is primarily a client-side/offline model + automation pipeline rather than a networked API)"],"best_when":"You need a research/engineering toolkit to generate GUI actions from screenshots/video in desktop or mobile environments, and you can run model inference locally or via a documented deployment route.","avoid_when":"You need a standardized REST/SDK service with strict auth/rate-limit guarantees or you require robust safety/compliance controls for real-world account access.","alternatives":["Midscene/Midscene.js (browser automation with LLM support)","Other GUI agent frameworks (e.g., computer-use agents built on similar action parsers) and generic browser automation tools","Desktop automation libraries like PyAutoGUI or Playwright (manual control rather than autonomous planning)"],"af_score":42.5,"security_score":17.5,"reliability_score":32.5,"package_type":"skill","discovery_source":["openclaw"],"priority":"high","status":"evaluated","version_evaluated":null,"last_evaluated":"2026-03-29T14:20:30.226201+00:00","interface":{"has_rest_api":false,"has_graphql":false,"has_grpc":false,"has_mcp_server":false,"mcp_server_url":null,"has_sdk":false,"sdk_languages":["Python"],"openapi_spec_url":null,"webhooks":false},"auth":{"methods":[],"oauth":false,"scopes":false,"notes":"README content does not describe a hosted API requiring authentication. Model deployment is referenced via a “Huggingface endpoint” approach, which typically uses Hugging Face auth tokens, but no auth details are provided in the supplied text."},"pricing":{"model":null,"free_tier_exists":false,"free_tier_limits":null,"paid_tiers":[],"requires_credit_card":false,"estimated_workload_costs":null,"notes":"Costs depend on how inference is deployed (e.g., local hardware vs. Hugging Face endpoint). No pricing tiers or credit-card requirements are stated in the provided README."},"requirements":{"requires_signup":false,"requires_credit_card":false,"domain_verification":false,"data_residency":[],"compliance":[],"min_contract":null},"agent_readiness":{"af_score":42.5,"security_score":17.5,"reliability_score":32.5,"mcp_server_quality":0.0,"documentation_accuracy":60.0,"error_message_quality":0.0,"error_message_notes":null,"auth_complexity":90.0,"rate_limit_clarity":0.0,"tls_enforcement":0.0,"auth_strength":20.0,"scope_granularity":0.0,"dependency_hygiene":30.0,"secret_handling":40.0,"security_notes":"The provided README does not describe transport security (TLS), authentication, or authorization boundaries. The tool is designed to automate GUI interactions and includes a limitation about possible misuse (e.g., automating authentication challenges). Running it can generate actions that may interact with sensitive user sessions, so it should be sandboxed and constrained (least-privilege execution environment, user confirmation/guardrails). Specific dependency/security practices are not shown in the supplied content.","uptime_documented":0.0,"version_stability":55.0,"breaking_changes_history":50.0,"error_recovery":25.0,"idempotency_support":"false","idempotency_notes":null,"pagination_style":"none","retry_guidance_documented":false,"known_agent_gotchas":["GUI agents are sensitive to coordinate systems; README notes absolute-coordinate grounding (Qwen 2.5vl) and points to a coordinates-processing guide","The system outputs automation actions/code; downstream execution needs sandboxing to avoid unintended clicks/keystrokes","Action generation may fail or misidentify GUI elements in ambiguous environments (noted as a limitation)"]}}