Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents Paper • 2501.13299 • Published Jan 23
ThinkTuning: Instilling Cognitive Reflections without Distillation Paper • 2508.07616 • Published Aug 11
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software Paper • 2509.25248 • Published Sep 27 • 2
UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization Paper • 2407.03525 • Published Jul 3, 2024 • 3
Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization Paper • 2405.16681 • Published May 26, 2024 • 1
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers Paper • 2402.10601 • Published Feb 16, 2024 • 1
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench Paper • 2508.20931 • Published Aug 28 • 15
UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization Paper • 2407.03525 • Published Jul 3, 2024 • 3
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers Paper • 2402.10601 • Published Feb 16, 2024 • 1
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark Paper • 2410.14702 • Published Oct 6, 2024 • 1
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published Apr 23, 2024 • 10
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space Paper • 2402.05195 • Published Feb 7, 2024 • 19
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations Paper • 2312.04655 • Published Dec 7, 2023 • 21
LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks Paper • 2311.09564 • Published Nov 16, 2023
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis Paper • 2302.08624 • Published Feb 16, 2023 • 3
TarGEN: Targeted Data Generation with Large Language Models Paper • 2310.17876 • Published Oct 27, 2023