Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs Paper • 2510.11288 • Published 21 days ago • 46
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published 28 days ago • 110
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features Paper • 2509.22033 • Published Sep 26 • 18
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs Paper • 2509.08358 • Published Sep 10 • 13
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs Paper • 2508.11383 • Published Aug 15 • 40
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds Paper • 2508.12782 • Published Aug 18 • 25
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published Aug 7 • 46
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models Paper • 2507.13563 • Published Jul 17 • 52
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published Jul 16 • 36
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8 • 118
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback Paper • 2507.02321 • Published Jul 3 • 39