This collection focuses on tests-as-truth evaluation, diff-based coding, and agentic workflow learning.
Note After training of the first datasets in the series. This is the final update.