verify

A Claude Code plugin (“opslane-verify”) that performs spec-driven verification: it interprets an acceptance-spec document, extracts acceptance criteria, runs one browser automation agent per criterion (Playwright) against a local dev server, then a judge step reviews screenshots/traces to produce pass/fail results with evidence (screenshots and session recordings).

Evaluated Mar 30, 2026 (45d ago)

Repo ↗ Testing claude claude-code claude-code-plugins claude-code-skill verification playwright browser-automation spec-testing qa evidence

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

The provided content does not describe TLS usage, secret handling, or dependency posture for the plugin itself; it only references Playwright and that Claude Code uses OAuth login. Because the workflow runs browser agents against a local dev server, it may have access to application data visible in the UI—users should ensure test environments and data exposure are appropriate.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You have a local dev server and a stable, testable acceptance-spec format, and you want quick feedback with visual evidence for each acceptance criterion.

Avoid When

You need strongly controlled, deterministic testing at scale (e.g., large suites in shared CI environments), or you cannot safely run browser agents that interact with the app under test.

Use Cases

• Validating web-app behavior against a written set of acceptance criteria before pushing changes
• Generating automated evidence (screenshots/video/Playwright traces) for failed acceptance criteria
• Reducing manual QA effort for regression checks driven by spec documents

Not For

• Use as a general-purpose CI/CD system (README explicitly states “No CI. No infrastructure.”)
• High-assurance security testing or compliance validation without human review (it’s an automated verification workflow, not a formal verifier)
• Environments where you cannot run browser automation against a local dev server

Interface

REST API

GraphQL

gRPC

MCP Server

Yes

SDK

Webhooks

Authentication

Methods: claude login (OAuth login for Claude Code)

OAuth: Yes Scopes: No

The README mentions Claude Code with OAuth login via `claude login`, but does not describe scopes or plugin-specific authorization beyond requiring that login.

Pricing

Free tier: No

Requires CC: No

No pricing information found in the provided content.

Agent Metadata

Pagination

none

Idempotent

False

Retry Guidance

Not documented

Known Gotchas

⚠ Spec interpretation/planning may require the user to provide a testable spec and answer clarifying questions before execution.
⚠ Browser-agent runs can be non-deterministic if the app under test is flaky or uses timing-dependent UI behavior.
⚠ Parallel execution per acceptance criterion (if implemented) may stress shared dev-server state; without explicit isolation, runs may affect each other.
⚠ The workflow is evidence-heavy (screenshots/video/trace), which may be impacted by large sessions or environment limitations.

Alternatives

Playwright test runner with hand-authored test specs Cypress + visual diffing BDD frameworks (Cucumber/Behave) with Playwright/Selenium backends Contract/spec-based testing tools that integrate into CI pipelines

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for verify.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-30.