{"id":"bytedance-ui-tars","name":"UI-TARS","af_score":42.5,"security_score":17.5,"reliability_score":32.5,"what_it_does":"UI-TARS is an open-source multimodal agent for automated GUI interaction. It uses a vision-language model to parse/ground visual observations and generate structured action instructions that can be translated into automation code (e.g., PyAutoGUI) to operate desktop/mobile UIs.","best_when":"You need a research/engineering toolkit to generate GUI actions from screenshots/video in desktop or mobile environments, and you can run model inference locally or via a documented deployment route.","avoid_when":"You need a standardized REST/SDK service with strict auth/rate-limit guarantees or you require robust safety/compliance controls for real-world account access.","last_evaluated":"2026-03-29T14:20:30.226201+00:00","has_mcp":false,"has_api":false,"auth_methods":[],"has_free_tier":false,"known_gotchas":["GUI agents are sensitive to coordinate systems; README notes absolute-coordinate grounding (Qwen 2.5vl) and points to a coordinates-processing guide","The system outputs automation actions/code; downstream execution needs sandboxing to avoid unintended clicks/keystrokes","Action generation may fail or misidentify GUI elements in ambiguous environments (noted as a limitation)"],"error_quality":0.0}