Auto-claude-code-research-in-sleep

Auto-claude-code-research-in-sleep (ARIS) is a collection of Claude Code–style “skills” (Markdown-based workflows) that orchestrate an autonomous ML research pipeline: idea discovery, cross-model review loops (Claude Code as executor, an external LLM via MCP/Codex-style reviewer), experiment planning/bridging, paper writing, and post-submission rebuttal plus slide/poster generation. It also documents adaptations for other agent IDEs (Cursor, Trae, Antigravity, OpenClaw) and supports alternative model combinations via OpenAI-compatible APIs (as a reviewer).

Evaluated Mar 29, 2026 (0d ago)
Repo ↗ Ai Ml ai-ml ai-research autonomous-agents claude-code mcp paper-writing experimentation markdown-workflows
⚙ Agent Friendliness
48
/ 100
Can an agent use this?
🔒 Security
46
/ 100
Is it safe for agents?
⚡ Reliability
25
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
45
Documentation
70
Error Messages
0
Auth Simplicity
55
Rate Limits
20

🔒 Security

TLS Enforcement
60
Auth Strength
50
Scope Granularity
20
Dep. Hygiene
55
Secret Handling
45

Security details are mostly not specified in the excerpt. The design emphasizes anti-hallucination/verification steps (e.g., DBLP/CrossRef citation fetching) and human-in-the-loop checkpoints (AUTO_PROCEED checkpoints), which can reduce risks of fabricated outputs. However, because workflows may clone repos, run experiments, and call external model providers via MCP/API, the primary security risks are credential handling, supply-chain/code execution, and prompt/tool injection via inputs/papers/repo content. The README excerpt does not provide detailed guidance on sandboxing, safe execution, or structured secret management.

⚡ Reliability

Uptime/SLA
0
Version Stability
35
Breaking Changes
30
Error Recovery
35
AF Security Reliability

Best When

You want an agent workflow you can plug into Claude Code (or adapt to other IDEs) to run structured research-to-paper iterations with a separate reviewer model, while keeping artifacts as plain files (Markdown/outputs).

Avoid When

You need a stable, versioned hosted API with documented SLAs, or you cannot provide external LLM credentials/tooling for the reviewer/execution environments.

Use Cases

  • Autonomous overnight iteration on an ML research direction or paper draft (idea → critique → revise)
  • Cross-model paper review to reduce self-review blind spots
  • Experiment planning and experiment execution handoff from research ideas to runnable code/tests
  • Drafting ICML/NeurIPS/ACL-style rebuttals with structured coverage checks
  • Generating presentation artifacts (slides/poster) from a paper repository
  • Adapting the same research workflow to different agent IDEs and model providers

Not For

  • Production-grade, fully autonomous scientific pipelines without human checkpoints (the repo emphasizes safety gates but still relies on user/agent toolchains)
  • Use as a general-purpose API service (it is primarily a local/workflow skill set)
  • Environments requiring strict compliance attestations or formal security guarantees based on audited code

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
Yes
SDK
No
Webhooks
No

Authentication

Methods: codex setup / codex mcp (via Claude Code MCP integration) OpenAI-compatible API credentials for reviewer models (alternative model combinations) IDE-specific configuration for integrations/adaptations (Cursor/Trae/Antigravity/OpenClaw)
OAuth: No Scopes: No

Authentication appears to be handled by the underlying IDE/tooling and the reviewer model provider (e.g., Codex MCP setup, OpenAI-compatible APIs). The README does not specify OAuth flows or fine-grained scopes for ARIS itself.

Pricing

Free tier: Yes
Requires CC: No

Costs depend on chosen models/providers (Codex/LLM calls and optional experiments). The repo claims a free tier via ModelScope for zero cost, zero lock-in, but no quantitative rate/usage limits are shown in the provided excerpt.

Agent Metadata

Pagination
none
Idempotent
False
Retry Guidance
Not documented

Known Gotchas

  • Because workflows drive code cloning, experiment execution, and multi-step document generation, failures may require manual intervention (the provided excerpt doesn’t describe robust retry/idempotency semantics).
  • Cross-model pipelines can fail if the reviewer/executor toolchains (MCP/Codex/OpenAI-compatible API) are not correctly configured.
  • Auto-experiment and rebuttal pipelines depend on accurate mapping of claims/concerns; the README mentions safety gates, but operational failure modes and recovery steps aren’t detailed in the provided content.

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for Auto-claude-code-research-in-sleep.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-29.

5347
Packages Evaluated
21056
Need Evaluation
586
Need Re-evaluation
Community Powered