VESTA: Visual Exploration with Statistical Tool Agents Paper • 2606.00384 • Published 21 days ago • 2
Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference Paper • 2606.05308 • Published 16 days ago • 2
Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference Paper • 2606.05308 • Published 16 days ago • 2
When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges Paper • 2605.26046 • Published 25 days ago • 3
When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges Paper • 2605.26046 • Published 25 days ago • 3
PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation Paper • 2601.18777 • Published Jan 26
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published Jan 17, 2025 • 10
Running 3.9k The Ultra-Scale Playbook 🌌 3.9k The ultimate guide to training LLM on large GPU Clusters
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives Paper • 1811.05372 • Published Nov 13, 2018
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation Paper • 2405.10040 • Published May 16, 2024