Our Research

Scientific AI needs more than models. It needs shared knowledge, trainable scientific models, agent harnesses, rigorous evaluation, and public discovery artifacts — built as open infrastructure.

  • 01KnowledgeShared methods, protocols, and research skills.
  • 02ModelsTrained Models for professional scientific reasoning.
  • 03AgentsAI systems for real research workflows.
  • 04EvaluationBenchmarks and Enviroments for reliable scientific work.
  • 05DiscoveryPublic outputs that improve the infrastructure.
§ 01 / 05 · Knowledge

Scientific Knowledge

A commons of research skills — authored by human experts, executed by AI agents.

§ 01.01 · Knowledge

ResearchSkills

ResearchSkills.ai is an open platform that turns real scientific workflows into reusable skills for AI agents. Instead of storing raw chat logs, it reconstructs research sessions as decision trajectories — how researchers form hypotheses, diagnose failures, choose methods, and decide when to pivot — and distills them into portable skills that agents can retrieve and execute. Contributions are extracted locally, automatically de-identified, reviewed by domain experts for scientific accuracy, and published to an open library spanning 155+ scientific subdomains under CC BY 4.0. The goal is to give AI systems not just raw capability, but the tacit research judgment that determines which experiments matter, which dead ends to avoid, and when to persist versus change direction.

§ 01.02 · Knowledge

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

This paper systematically reveals how model scale, data volume, and compute jointly govern the scaling behavior of RL post-training for mathematical reasoning in large language models.

§ 02 / 05 · Models

Scientific Models

Recipes that turn open base models into Skills-fluent scientific agent brains.

§ 02.01 · Models

The Unified Model for Computational Biology and Drug Discovery

BioUniGen.xyz is an integrated model platform for computational biology and drug discovery that unifies recognition and generation within a shared biological representation framework. Instead of treating tasks like molecule design, protein folding, function annotation, and de novo sequence generation as separate problems, it connects molecular sequences, 3D structures, and functional mechanisms in one adaptable system. By combining multi-modal biological inputs with joint predictive analysis and generative design, BioUniGen supports end-to-end research workflows such as molecular optimization, structural simulation, and functional mining. The goal is to overcome the fragmentation of existing biological AI tools and provide a more coherent engine for life science research.

§ 03 / 05 · Agent

Scientific Agents

An agent harness whose primitives are lab and literature native.

§ 03.01 · Agent

SUDP: A Protocol for Secret-Use Delegation in Agentic Systems

SUDP (Secret-Use Delegation Protocol) is a protocol for agentic systems that lets AI agents perform secret-backed operations without ever holding the underlying secret itself: instead of putting reusable credentials like API keys or OAuth tokens inside the agent runtime, it keeps secret ownership with the user and delegates only narrowly scoped, single-use, transaction-bound authorization for a specific action, recipient, and validity window. In practice, it works through three phases—setup, authorization grant, and consumption—so an agent can request an operation, the user can approve that exact operation with an authenticator-backed gesture, and the system can execute it without exposing the raw credential to the agent, making credential use more auditable and more resistant to leakage, replay, and misuse.

§ 03.02 · Agent

OASIS: Open Agent Social Interaction Simulations with One Million Agents

The most popular social simulation framework. 4.4k github star. The first large-scale agents society simulator and AI social scientist framework.

§ 03.03 · Agent

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Multi-agent self-evolving framework.

§ 03.04 · Agent

Eigen-Agent: Adaptive Multi-Agent Scientific Reasoning with Monitor-Based RAG

This work introduces a unified scientific reasoning framework that combines token-level implicit retrieval with structured multi-agent refinement to improve accuracy while substantially reducing token usage and interaction steps.

§ 04 / 05 · Evaluation

Scientific Evaluation

A pluralistic research environment where AI agent scientists actually work.

§ 04.01 · Evaluation

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

SciAgentGym provides a scalable scientific tool-use environment with 1,780 domain-specific tools, a tiered benchmark for long-horizon agent evaluation, and SciForge for synthesizing logic-aware training trajectories to advance autonomous scientific agents.

§ 04.02 · Evaluation

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

LabUtopia is the first comprehensive laboratory-scale embodied intelligence platform that unifies multi-physics simulation, chemically meaningful interactions, procedural scientific scene generation, and hierarchical long-horizon benchmarks to push scientific agents from simple manipulation toward generalizable experimental reasoning.

§ 05 / 05 · Discovery

Scientific Discovery

The papers and findings we produce using our own stack.

§ 05.01 · Discovery

AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research

This paper envisions AI-driven Science of Science as a new paradigm for automatically discovering large-scale research patterns, simulating scientific societies, and revealing the hidden mechanisms that drive innovation beyond the reach of traditional statistical and rule-based methods.

§ 05.02 · Discovery

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Agentic Reinforcement Learning (Agentic RL) reframes large language models from passive text generators into autonomous agents that learn to make decisions in dynamic, partially observable environments. This survey synthesizes over 500 recent works, systematically organizing core agentic capabilities, application domains, open-source environments, benchmarks, and frameworks to guide the development of scalable general-purpose AI agents.

Add your research to the open stack.

Every output here links back to the Agent runs, the skills, and the evaluation tasks that produced it. Help us extend the stack into your subdomain.

Back to overview