PyAutoGUI

Cross-platform GUI automation library — controls mouse, keyboard, and takes screenshots programmatically. PyAutoGUI features: pyautogui.moveTo(x, y), pyautogui.click(), pyautogui.typewrite('text'), pyautogui.press('enter'), pyautogui.screenshot(), pyautogui.locateOnScreen() (image template matching), pyautogui.hotkey('ctrl', 'c'), pyautogui.scroll(), drag and drop, window management, PAUSE between actions, FAILSAFE (move mouse to corner to abort), and cross-platform support (Windows, macOS, Linux). Primary Python library for desktop GUI automation for agent computer-use tasks.

Evaluated Mar 06, 2026 (0d ago) v0.9.x

Homepage ↗ Repo ↗ Developer Tools python pyautogui gui-automation mouse keyboard screenshot desktop-automation

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Full desktop control — pyautogui can control any application including password managers, banking apps, and system dialogs. Agent automation must not log or expose keystrokes containing credentials. FAILSAFE provides emergency stop but doesn't sandbox agent actions. Run in dedicated user account for sensitive automation.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

Automating desktop applications (non-browser) for agent computer-use tasks — PyAutoGUI provides simple cross-platform mouse/keyboard control for legacy applications, desktop GUIs, and screen-based automation where Playwright/Selenium don't apply.

Avoid When

You're automating web browsers (use Playwright), need headless operation without Xvfb, or require precise timing for fast UI interactions.

Use Cases

• Agent desktop automation — pyautogui.click(500, 300); pyautogui.typewrite('agent task input', interval=0.05); pyautogui.press('enter') — agent clicks UI element and types text; desktop automation without Playwright or Selenium for non-browser apps; legacy application automation
• Agent screenshot and OCR — screenshot = pyautogui.screenshot(); screenshot.save('agent_view.png') — agent captures current screen state; combined with Tesseract/EasyOCR for text extraction; agent perceives screen content for decision making
• Agent image-based clicking — location = pyautogui.locateOnScreen('button.png', confidence=0.9); pyautogui.click(location) — click button by visual template matching; agent finds UI elements by screenshot without coordinate hardcoding; confidence parameter for fuzzy matching
• Agent hotkey sequences — pyautogui.hotkey('ctrl', 'alt', 't'); pyautogui.sleep(0.5); pyautogui.typewrite('ls -la ') — agent opens terminal and runs command; keyboard shortcut automation for agent desktop workflows
• Agent failsafe automation — pyautogui.PAUSE = 0.5; pyautogui.FAILSAFE = True — 0.5 second pause between all pyautogui calls; move mouse to corner aborts automation; agent automation with safety mechanisms to prevent runaway actions

Not For

• Browser automation — use Playwright or Selenium; pyautogui can control browsers but lacks DOM access, wait conditions, and proper web automation
• Fast precise automation — pyautogui.PAUSE slows all actions; for high-frequency automation use platform-specific APIs (win32api on Windows, Quartz on macOS)
• Headless environments — pyautogui requires a display; for headless agent automation use virtual framebuffer (Xvfb on Linux) or Playwright headless

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

No auth — local desktop automation library.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

PyAutoGUI is BSD licensed. Free for all use.

Agent Metadata

Pagination

none

Idempotent

Partial

Retry Guidance

Not documented

Known Gotchas

⚠ PyAutoGUI requires display — running on headless Linux server (Docker, CI) fails with 'Cannot connect to X server'; agent GUI automation in CI must use Xvfb: Xvfb :99 -screen 0 1280x720x24 & export DISPLAY=:99; Docker agent containers need --privileged and Xvfb startup
⚠ locateOnScreen fails on HiDPI displays — macOS Retina/Windows HiDPI scales screenshots at 2x; pyautogui.locateOnScreen('button.png') fails because screenshot coordinates don't match physical pixels; agent automation on HiDPI must use pyautogui.locateOnScreen with correct scale or use pyautogui.screenshot() then scale template
⚠ pyautogui.PAUSE slows all operations — default pyautogui.PAUSE=0.1 adds 100ms after every mouse/keyboard call; 100 actions = 10 seconds minimum; agent automation loops with many actions must set pyautogui.PAUSE=0 and add explicit waits only where needed; never set to 0 without understanding risk
⚠ typewrite() only accepts ASCII — pyautogui.typewrite('hello') works; pyautogui.typewrite('héllo') silently skips non-ASCII characters; agent automation typing Unicode must use pyperclip.copy() + pyautogui.hotkey('ctrl', 'v') for clipboard paste as workaround
⚠ FAILSAFE moves mouse to (0,0) not corner — pyautogui.FAILSAFE=True triggers on mouse in top-left corner; agent automation on multi-monitor setups where top-left is not primary monitor origin may accidentally trigger FAILSAFE; set FAILSAFE=True and test on target display configuration
⚠ Screen resolution changes break coordinate-based automation — pyautogui.click(800, 600) hardcodes pixel coordinates; resolution changes or window repositioning break agent automation; use pyautogui.locateOnScreen() for visual element finding or pygetwindow for window-relative coordinates

Alternatives

anthropic-computer-use-api playwright-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for PyAutoGUI.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.