Screenhand

⚠ Stale — 111d ago

ScreenHand is an open-source MCP server (stdio transport) that gives AI agents local control over macOS and Windows desktop UIs (via Accessibility APIs) and optionally browser automation (Chrome DevTools Protocol). It exposes a large set of tools for UI inspection, interaction (click/type/keys/scroll/menus/drag), perception/OCR fallbacks, job orchestration, and per-app “app mastery map” learning.

Evaluated Mar 30, 2026 (111d ago)

Homepage ↗ Repo ↗ DevTools mcp mcp-server desktop-automation accessibility ui-automation browser-automation cdp ocr typescript agent-tools local-first

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Security posture appears to rely on local execution and OS permissions rather than network auth. The README claims: local-first (no screen data externally), blocking dangerous browser protocols (e.g., javascript:/data:), and audit-logging of AppleScript and browser JS execution; however, details of enforcement, logging destinations, and secret redaction are not verifiable from the provided text. TLS is not applicable because transport is stdio to a local process. Lack of explicit authentication/authorization means any party that can run/connect to the MCP server could drive the desktop.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You want a local-first AI agent that can directly operate native apps and (optionally) Chrome with low latency, and you can grant the required local OS permissions.

Avoid When

You cannot safely grant accessibility/automation permissions or you need a networked SaaS API with built-in policy enforcement/audit trails.

Use Cases

• Desktop automation for repetitive UI tasks (forms, menus, navigation)
• Browser automation with CDP for scripted workflows
• Assistive UI operations driven by an AI agent
• Cross-app workflows that move data between browser/app and desktop apps
• QA/smoke testing style automation
• Workflow recording/playbooks based on observed successful actions

Not For

• Running in fully headless environments without a desktop session
• Highly sensitive operations without user review/controls (it can trigger real UI actions)
• Environments where installing local tooling or granting accessibility permissions is not allowed
• Controlling devices outside the local machine/session

Interface

REST API

GraphQL

gRPC

MCP Server

Yes

SDK

Webhooks

Authentication

Methods: None explicitly described for MCP/stdio server usage

OAuth: No Scopes: No

No user authentication described for the local MCP server; access is effectively whoever can run the process and connect via stdio. OS-level permissions (Accessibility on macOS) are required for desktop control.

Pricing

Free tier: No

Requires CC: No

Project is presented as open-source and local-first; ongoing costs would primarily be any downstream LLM/API usage by your AI client (the README claims zero LLM calls for click/typing once tools are invoked).

Agent Metadata

Pagination

none

Idempotent

False

Retry Guidance

Not documented

Known Gotchas

⚠ Requires macOS Accessibility permission for the terminal app to allow UI control.
⚠ Browser automation requires launching Chrome with --remote-debugging-port=9222 and a running instance with remote debugging enabled.
⚠ Tool calls can have side effects (click/type/JS execution); agents should use confirmation/guardrails for destructive actions.
⚠ Cross-app control assumes UI state stability; dynamic layouts may still require fallback strategies (Accessibility→CDP→OCR→coordinates) and careful recovery.

Alternatives

Anthropic Computer Use (cloud/screenshot-based) MCP servers and automation stacks using OS automation APIs (custom tools) Playwright/Selenium for web-only automation UI testing tools (e.g., Playwright desktop, Appium, Robot Framework) for deterministic UI automation

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for Screenhand.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-30.