Auto-claude-code-research-in-sleep

Auto-claude-code-research-in-sleep (ARIS) is a collection of Claude Code–style “skills” (Markdown-based workflows) that orchestrate an autonomous ML research pipeline: idea discovery, cross-model review loops (Claude Code as executor, an external LLM via MCP/Codex-style reviewer), experiment planning/bridging, paper writing, and post-submission rebuttal plus slide/poster generation. It also documents adaptations for other agent IDEs (Cursor, Trae, Antigravity, OpenClaw) and supports alternative model combinations via OpenAI-compatible APIs (as a reviewer).

Evaluated Mar 29, 2026 (45d ago)

Repo ↗ Ai Ml ai-ml ai-research autonomous-agents claude-code mcp paper-writing experimentation markdown-workflows

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Security details are mostly not specified in the excerpt. The design emphasizes anti-hallucination/verification steps (e.g., DBLP/CrossRef citation fetching) and human-in-the-loop checkpoints (AUTO_PROCEED checkpoints), which can reduce risks of fabricated outputs. However, because workflows may clone repos, run experiments, and call external model providers via MCP/API, the primary security risks are credential handling, supply-chain/code execution, and prompt/tool injection via inputs/papers/repo content. The README excerpt does not provide detailed guidance on sandboxing, safe execution, or structured secret management.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You want an agent workflow you can plug into Claude Code (or adapt to other IDEs) to run structured research-to-paper iterations with a separate reviewer model, while keeping artifacts as plain files (Markdown/outputs).

Avoid When

You need a stable, versioned hosted API with documented SLAs, or you cannot provide external LLM credentials/tooling for the reviewer/execution environments.

Use Cases

• Autonomous overnight iteration on an ML research direction or paper draft (idea → critique → revise)
• Cross-model paper review to reduce self-review blind spots
• Experiment planning and experiment execution handoff from research ideas to runnable code/tests
• Drafting ICML/NeurIPS/ACL-style rebuttals with structured coverage checks
• Generating presentation artifacts (slides/poster) from a paper repository
• Adapting the same research workflow to different agent IDEs and model providers

Not For

• Production-grade, fully autonomous scientific pipelines without human checkpoints (the repo emphasizes safety gates but still relies on user/agent toolchains)
• Use as a general-purpose API service (it is primarily a local/workflow skill set)
• Environments requiring strict compliance attestations or formal security guarantees based on audited code

Interface

REST API

GraphQL

gRPC

MCP Server

Yes

SDK

Webhooks

Authentication

Methods: codex setup / codex mcp (via Claude Code MCP integration) OpenAI-compatible API credentials for reviewer models (alternative model combinations) IDE-specific configuration for integrations/adaptations (Cursor/Trae/Antigravity/OpenClaw)

OAuth: No Scopes: No

Authentication appears to be handled by the underlying IDE/tooling and the reviewer model provider (e.g., Codex MCP setup, OpenAI-compatible APIs). The README does not specify OAuth flows or fine-grained scopes for ARIS itself.

Pricing

Free tier: Yes

Requires CC: No

Costs depend on chosen models/providers (Codex/LLM calls and optional experiments). The repo claims a free tier via ModelScope for zero cost, zero lock-in, but no quantitative rate/usage limits are shown in the provided excerpt.

Agent Metadata

Pagination

none

Idempotent

False

Retry Guidance

Not documented

Known Gotchas

⚠ Because workflows drive code cloning, experiment execution, and multi-step document generation, failures may require manual intervention (the provided excerpt doesn’t describe robust retry/idempotency semantics).
⚠ Cross-model pipelines can fail if the reviewer/executor toolchains (MCP/Codex/OpenAI-compatible API) are not correctly configured.
⚠ Auto-experiment and rebuttal pipelines depend on accurate mapping of claims/concerns; the README mentions safety gates, but operational failure modes and recovery steps aren’t detailed in the provided content.

Alternatives

PaperQA / LLM-assisted literature review pipelines OpenAI Codex/Claude Code custom scripts and tool integrations Other agentic “paper writing” or “research automation” frameworks (custom pipelines) Workflow automation systems that orchestrate LLM calls + external tools (e.g., custom orchestration with LangGraph-like tooling)

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for Auto-claude-code-research-in-sleep.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-29.