pilot-shell

Pilot Shell is an installation/CLI wrapper around Claude Code that provides a structured “spec-driven development” workflow (plan/approve/implement/verify), enforced quality hooks (lint/format/typecheck), git worktree-based task isolation, and a local dashboard for session/spec/review/status visibility.

Evaluated Mar 30, 2026 (21d ago)
Homepage ↗ Repo ↗ DevTools ai-agents ai-coding-tools claude-code spec-driven-development tdd developer-tools cli
⚙ Agent Friendliness
43
/ 100
Can an agent use this?
🔒 Security
48
/ 100
Is it safe for agents?
⚡ Reliability
36
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
45
Documentation
70
Error Messages
0
Auth Simplicity
75
Rate Limits
10

🔒 Security

TLS Enforcement
60
Auth Strength
55
Scope Granularity
20
Dep. Hygiene
60
Secret Handling
45

Pilot Shell downloads and runs installer scripts (curl | bash pattern) and installs multiple tools (language servers, browser automation, dependencies). README does not provide detailed guidance on secret storage/redaction, least-privilege scopes, or threat modeling of local execution. It integrates with Claude/Claude Code auth via external subscription status rather than specifying its own auth mechanism.

⚡ Reliability

Uptime/SLA
0
Version Stability
55
Breaking Changes
30
Error Recovery
60
AF Security Reliability

Best When

You are already using Claude Code and want stronger engineering discipline (TDD, verification, review gates) with a consistent workflow across sessions and projects.

Avoid When

You need a simple chat interface only, or you cannot accommodate local installation, hooks, and test execution as part of the agent loop.

Use Cases

  • Coding tasks where you want spec-driven planning and verification instead of free-form agent chatting
  • Reducing regressions by enforcing TDD and running full test suites during agent work
  • Review-gated changes with interactive annotations (spec approval and inline diff review)
  • Headless/CI automation of Claude Code workflows using a non-interactive flag
  • Project bootstrapping of rules/hooks/MCP servers via /setup-rules

Not For

  • Teams that cannot run local CLI tooling or that require a purely remote SaaS workflow
  • Workflows that do not use git or where worktree-based isolation is not feasible
  • Environments that cannot install browser automation components or required language servers
  • Use as a general-purpose external API service (it is primarily a local dev tool)

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
Yes
SDK
No
Webhooks
No

Authentication

Methods: Claude subscription/auth via claude auth status (as referenced in README for tier detection)
OAuth: No Scopes: No

Pilot Shell appears to rely on Claude Code’s existing authentication model; README does not specify OAuth flows or fine-grained scope management for Pilot itself.

Pricing

Free tier: No
Requires CC: No

README discusses Claude subscription tiers for usage; it does not state a separate Pilot Shell pricing model.

Agent Metadata

Pagination
none
Idempotent
True
Retry Guidance
Not documented

Known Gotchas

  • Requires local environment setup (dependencies, language servers, hooks); failures may occur if prerequisites can’t be installed/initialized.
  • Worktree-based implementation and browser automation for UI E2E may be slow/fragile in constrained environments.
  • Headless mode changes interaction style; agent assumptions about interactive approval may not hold without proper flags.

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for pilot-shell.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-30.

8642
Packages Evaluated
17761
Need Evaluation
586
Need Re-evaluation
Community Powered