Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published 18 days ago • 51
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models Paper • 2402.06044 • Published Feb 8, 2024 • 1
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors Paper • 2405.07940 • Published May 13, 2024
Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring Paper • 2406.19949 • Published Jun 28, 2024 • 1
Causal Reasoning of Entities and Events in Procedural Texts Paper • 2301.10896 • Published Jan 26, 2023
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives Paper • 2402.11051 • Published Feb 16, 2024 • 1
EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States Paper • 2503.03340 • Published Mar 5 • 1
EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs Paper • 2506.13641 • Published Jun 16
Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond Paper • 2402.14522 • Published Feb 22, 2024
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment Paper • 2509.00544 • Published Aug 30 • 11
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment Paper • 2509.00544 • Published Aug 30 • 11
IntrEx: A Dataset for Modeling Engagement in Educational Conversations Paper • 2509.06652 • Published Sep 8 • 24
Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? Paper • 2508.19827 • Published Aug 27 • 33