Recursive Dynamics — AI Safety & Alignment Research

Core Research

Our research spans four
interconnected domains

Primary Focus

AI Alignment

Developing provable methods to ensure advanced AI systems remain beneficial and aligned with human values as they scale in capability. Our work spans reward modeling, scalable oversight, debate-based training, and formal verification of alignment properties under distribution shift.

23 active projects · 14 researchers

Interpretability

Mechanistic Interpretability

Reverse-engineering the internal representations of neural networks to build tools that let humans understand and predict model behavior.

14 publications this year

Governance

Governance & Policy

Designing institutional frameworks and evaluation benchmarks for responsible frontier AI deployment across jurisdictions.

8 policy briefs released

Evaluation

Capabilities Evaluation

Rigorous measurement and prediction of emergent behaviors in scaled systems, including dangerous capability detection.

6 benchmarks released

Recent Publications

Advancing the Field

NeurIPS 2024

Recursive Oversight Through Hierarchical Verification

Chen, Wei, Park Read Paper →

ICML 2024

Geometric Interpretability of Latent Representations

Nakamura, Reyes Read Paper →

AAAI 2024

Cross-Jurisdictional Governance Frameworks for Frontier AI

Okonkwo, Müller Read Paper →

ICLR 2024

Quantifying Emergent Capabilities in Scaled Architectures

Lindqvist, Das Read Paper →

arXiv 2024

Debate-Based Alignment via Red-Team Evaluation

Torres, Gupta, Kim Read Paper →

NeurIPS 2023

Reward Model Verification Under Distribution Shift

Zhai, Andersen Read Paper →

SAFEML 2024

Formal Methods for Constitutional AI Constraints

Petrov, Liu Read Paper →

Residency Program

Join the Lab

Our residency program brings together PhD researchers, postdoctoral scholars, and independent investigators to collaborate on the most pressing problems in AI safety. Residents have full access to compute infrastructure, mentorship from senior researchers, and a cross-disciplinary community.

Access to 10,000+ GPU cluster
Dedicated mentorship pairing
Publication support and review
Travel funding for conferences
Cross-institutional collaboration
Flexible remote participation

Apply for Residency

3 – 6 months

Rolling applications

Remote-friendly

Submit Application

Designing IntelligenceThat RemainsAligned

Our research spans fourinterconnected domains