<< back

candid thoughts on ReplicationBench

November 24, 2025

ReplicationBench was a project I worked on primarily March through May 2025. One issue I have with publishing the standard arXiv paper / codebase / Tweet thread package is that it still leaves little room for unpolished, but hopefully interesting, personal opinions. In general the ideas below are unorganized, and supported only by my qualitative observations.

More background on the project

On differences between model performance

On qualitative feedback

On the future of AI-assisted science

On experiments I’d like to do