Auto-claude-code-research-in-sleep
Auto-claude-code-research-in-sleep (ARIS) is a collection of Claude Code–style “skills” (Markdown-based workflows) that orchestrate an autonomous ML research pipeline: idea discovery, cross-model review loops (Claude Code as executor, an external LLM via MCP/Codex-style reviewer), experiment planning/bridging, paper writing, and post-submission rebuttal plus slide/poster generation. It also documents adaptations for other agent IDEs (Cursor, Trae, Antigravity, OpenClaw) and supports alternative model combinations via OpenAI-compatible APIs (as a reviewer).
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Security details are mostly not specified in the excerpt. The design emphasizes anti-hallucination/verification steps (e.g., DBLP/CrossRef citation fetching) and human-in-the-loop checkpoints (AUTO_PROCEED checkpoints), which can reduce risks of fabricated outputs. However, because workflows may clone repos, run experiments, and call external model providers via MCP/API, the primary security risks are credential handling, supply-chain/code execution, and prompt/tool injection via inputs/papers/repo content. The README excerpt does not provide detailed guidance on sandboxing, safe execution, or structured secret management.
⚡ Reliability
Best When
You want an agent workflow you can plug into Claude Code (or adapt to other IDEs) to run structured research-to-paper iterations with a separate reviewer model, while keeping artifacts as plain files (Markdown/outputs).
Avoid When
You need a stable, versioned hosted API with documented SLAs, or you cannot provide external LLM credentials/tooling for the reviewer/execution environments.
Use Cases
- • Autonomous overnight iteration on an ML research direction or paper draft (idea → critique → revise)
- • Cross-model paper review to reduce self-review blind spots
- • Experiment planning and experiment execution handoff from research ideas to runnable code/tests
- • Drafting ICML/NeurIPS/ACL-style rebuttals with structured coverage checks
- • Generating presentation artifacts (slides/poster) from a paper repository
- • Adapting the same research workflow to different agent IDEs and model providers
Not For
- • Production-grade, fully autonomous scientific pipelines without human checkpoints (the repo emphasizes safety gates but still relies on user/agent toolchains)
- • Use as a general-purpose API service (it is primarily a local/workflow skill set)
- • Environments requiring strict compliance attestations or formal security guarantees based on audited code
Interface
Authentication
Authentication appears to be handled by the underlying IDE/tooling and the reviewer model provider (e.g., Codex MCP setup, OpenAI-compatible APIs). The README does not specify OAuth flows or fine-grained scopes for ARIS itself.
Pricing
Costs depend on chosen models/providers (Codex/LLM calls and optional experiments). The repo claims a free tier via ModelScope for zero cost, zero lock-in, but no quantitative rate/usage limits are shown in the provided excerpt.
Agent Metadata
Known Gotchas
- ⚠ Because workflows drive code cloning, experiment execution, and multi-step document generation, failures may require manual intervention (the provided excerpt doesn’t describe robust retry/idempotency semantics).
- ⚠ Cross-model pipelines can fail if the reviewer/executor toolchains (MCP/Codex/OpenAI-compatible API) are not correctly configured.
- ⚠ Auto-experiment and rebuttal pipelines depend on accurate mapping of claims/concerns; the README mentions safety gates, but operational failure modes and recovery steps aren’t detailed in the provided content.
Alternatives
Full Evaluation Report
Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for Auto-claude-code-research-in-sleep.
AI-powered analysis · PDF + markdown · Delivered within 30 minutes
Package Brief
Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.
Delivered within 10 minutes
Score Monitoring
Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.
Continuous monitoring
Scores are editorial opinions as of 2026-03-29.