Tianshu (天枢)

Tianshu is an enterprise AI data preprocessing platform that converts unstructured documents (PDF, Word, Excel, PPT), images, audio, and video into AI-ready Markdown/JSON formats using MinerU and PaddleOCR-VL engines. It exposes document parsing capabilities via an MCP server for integration with AI assistants.

Evaluated Mar 06, 2026 (0d ago) vlatest
Homepage ↗ Repo ↗ Data Processing pdf ocr markdown rag document-processing mcp enterprise multimodal
⚙ Agent Friendliness
62
/ 100
Can an agent use this?
🔒 Security
70
/ 100
Is it safe for agents?
⚡ Reliability
64
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
60
Documentation
65
Error Messages
50
Auth Simplicity
68
Rate Limits
55

🔒 Security

TLS Enforcement
80
Auth Strength
75
Scope Granularity
60
Dep. Hygiene
70
Secret Handling
65

Community/specialized tool. Apply standard security practices for category. Review documentation for specific security requirements.

⚡ Reliability

Uptime/SLA
70
Version Stability
65
Breaking Changes
60
Error Recovery
60
AF Security Reliability

Best When

When you need enterprise-grade, multi-format document ingestion with GPU acceleration and role-based access control for RAG or data pipeline work.

Avoid When

When you need a lightweight, cloud-hosted solution with no self-hosting overhead or when you only process a handful of documents occasionally.

Use Cases

  • Preparing large document corpora for RAG pipelines
  • Enterprise document digitization and OCR at scale (109+ languages)
  • Integrating document parsing into AI assistant workflows via MCP
  • Bioinformatics data extraction from FASTA and GenBank files
  • Audio/video transcription with speaker identification for knowledge bases

Not For

  • Simple single-file PDF text extraction (overkill)
  • Teams without Docker/GPU infrastructure
  • Real-time sub-second document processing requirements

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
Yes
SDK
No
Webhooks
No

Authentication

Methods: jwt api_key
OAuth: Yes Scopes: Yes

JWT access+refresh tokens with role-based access control (admin/user). API key self-service for third-party integration. Optional SSO via OIDC/SAML.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Apache 2.0 license. Self-hosted only; GPU infrastructure costs are on the operator.

Agent Metadata

Pagination
none
Idempotent
Unknown
Retry Guidance
Not documented

Known Gotchas

  • Requires local self-hosting with Docker and optionally CUDA GPU — no cloud endpoint
  • MCP server runs on a separate port (8002) from the REST API (8000)
  • Large PDFs auto-split but processing latency can be high on CPU-only deployments

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for Tianshu (天枢).

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-06.

5408
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered