AI-powered framework that accelerates AI governance scenario exploration through autonomous agent tabletop exercises, compressing preparation cycles from years to minutes.

Apart

Master's ThesisOct 2025

Explorando AI Safety via Debate: un estudio sobre capacidades asimétricas y jueces débiles en el entorno MNIST

Joaquín Machulsky

Master's thesis exploring AI safety through debate mechanisms, studying asymmetric capabilities and weak judges in the MNIST environment. Features an interactive demo.

Website

arXivOct 2025

Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity

Austin Meek, Eitan Sprejer, Iván Arcuschin, Austin J. Brockmeier, Steven Basart

Investigating how well chain-of-thought reasoning can be monitored for safety through faithfulness and verbosity metrics.

arXiv

arXivOct 2025

AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

María Victoria Carro, Denise Mester, Facundo Nieto, Oscar Stanchi, Guido Bergman, Mario Leiva, Eitan Sprejer, Luca Forziati Gangi, et al.

Studying how AI systems' internal beliefs affect their persuasiveness in debate scenarios — implications for AI safety and deception.

arXiv

NeurIPS WorkshopSep 2025

Approximating Human Preferences Using a Multi-Judge Learned System

Eitan Sprejer, Fernando Avalos, Augusto Mariano Bernardi, José Pedro Brito de Azevedo Faustino, Jacob Haimes, Narmeen Fatimah Oozeer

A multi-judge approach to better approximate human preferences in AI systems, improving alignment evaluation.

arXiv

Apart Research HackathonSep 20252nd Place

RobustCBRN Eval: A Practical Benchmark Robustification Toolkit

Luca De Leo, James Sykes, Balázs László, Ewura Ama Etruwaa Sam

A pipeline addressing CBRN evaluation vulnerabilities through consensus detection, verified cloze scoring, and statistical evaluation with bootstrap confidence intervals.

Apart

Apart Research HackathonJun 20251st Place

Four Paths to Failure: Red Teaming ASI Governance

Luca De Leo, Zoé Roy-Stang, Heramb Podar, Damin Curtis, Vishakha Agrawal, Ben Smyth

Stress-tested A Narrow Path Phase 0 ASI moratorium, identifying four circumvention routes and proposing ten mutually reinforcing policy amendments.

Apart

Our community continues to grow — we're building our publication track record through programs like AISAR and Apart Research sprints.

What You Could Work On

Current research directions in our community

Mechanistic Interpretability

Understanding how neural networks process information internally. What circuits implement specific behaviors? How can we reverse-engineer model cognition?

LLM Evaluations

Building benchmarks and testing methodologies for frontier models. How do we measure alignment? What capabilities emerge at scale?

Alignment Theory

Fundamental questions about making AI systems beneficial. How do we specify human values? What oversight mechanisms work?

Get Involved

Express Interest in Research

Want to contribute to AI safety research? Let us know your background and interests, and we'll connect you with relevant projects and collaborators.

Use our contact form to tell us about your background and research interests.

We review messages regularly and reach out when there's a good fit.

Ready to start your research journey?

Book a call with one of our co-founders to discuss your interests and find the right path.

Eitan Sprejer

Interpretability & Evaluations

Book with Eitan

Luca De Leo

Operations & Strategy

Book with Luca

Home / Research

Research at BAISH

We help students in Buenos Aires go from curious to published. Join research sprints, workshops, and a community of aspiring AI safety researchers.

Join a Program View Publications

Your Research Journey

From first steps to published researcher — here's how it works

Learn

AI Safety Fundamentals

Practice

AIS Research Workshop

Community Publications

Work by researchers connected to BAISH

Apart Research HackathonNov 2025

Table Top Agents

Luca De Leo

AI-powered framework that accelerates AI governance scenario exploration through autonomous agent tabletop exercises, compressing preparation cycles from years to minutes.

Apart

Master's ThesisOct 2025

Explorando AI Safety via Debate: un estudio sobre capacidades asimétricas y jueces débiles en el entorno MNIST

Joaquín Machulsky

Master's thesis exploring AI safety through debate mechanisms, studying asymmetric capabilities and weak judges in the MNIST environment. Features an interactive demo.

Website

arXivOct 2025

Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity

Austin Meek, Eitan Sprejer, Iván Arcuschin, Austin J. Brockmeier, Steven Basart

Investigating how well chain-of-thought reasoning can be monitored for safety through faithfulness and verbosity metrics.

arXiv

arXivOct 2025

AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

María Victoria Carro, Denise Mester, Facundo Nieto, Oscar Stanchi, Guido Bergman, Mario Leiva, Eitan Sprejer, Luca Forziati Gangi, et al.

Studying how AI systems' internal beliefs affect their persuasiveness in debate scenarios — implications for AI safety and deception.

arXiv

NeurIPS WorkshopSep 2025

Approximating Human Preferences Using a Multi-Judge Learned System

Eitan Sprejer, Fernando Avalos, Augusto Mariano Bernardi, José Pedro Brito de Azevedo Faustino, Jacob Haimes, Narmeen Fatimah Oozeer

A multi-judge approach to better approximate human preferences in AI systems, improving alignment evaluation.

arXiv

Apart Research HackathonSep 20252nd Place

RobustCBRN Eval: A Practical Benchmark Robustification Toolkit

Luca De Leo, James Sykes, Balázs László, Ewura Ama Etruwaa Sam

A pipeline addressing CBRN evaluation vulnerabilities through consensus detection, verified cloze scoring, and statistical evaluation with bootstrap confidence intervals.

Apart

Apart Research HackathonJun 20251st Place

Four Paths to Failure: Red Teaming ASI Governance

Luca De Leo, Zoé Roy-Stang, Heramb Podar, Damin Curtis, Vishakha Agrawal, Ben Smyth

Stress-tested A Narrow Path Phase 0 ASI moratorium, identifying four circumvention routes and proposing ten mutually reinforcing policy amendments.

Apart

Our community continues to grow — we're building our publication track record through programs like AISAR and Apart Research sprints.

What You Could Work On

Current research directions in our community

Mechanistic Interpretability

Understanding how neural networks process information internally. What circuits implement specific behaviors? How can we reverse-engineer model cognition?

LLM Evaluations

Building benchmarks and testing methodologies for frontier models. How do we measure alignment? What capabilities emerge at scale?

Alignment Theory

Fundamental questions about making AI systems beneficial. How do we specify human values? What oversight mechanisms work?

Get Involved

Express Interest in Research

Want to contribute to AI safety research? Let us know your background and interests, and we'll connect you with relevant projects and collaborators.

Use our contact form to tell us about your background and research interests.

We review messages regularly and reach out when there's a good fit.

Ready to start your research journey?

Book a call with one of our co-founders to discuss your interests and find the right path.

Eitan Sprejer

Interpretability & Evaluations

Book with Eitan

Luca De Leo

Operations & Strategy

Book with Luca