new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Oct 31

Submitted by

GMFTBY

The End of Manual Decoding: Towards Truly End-to-End Language Models

tencent

Submitted by

xinlongwang

Emu3.5: Native Multimodal Models are World Learners

BAAI

Beijing Academy of Artificial Intelligence

Submitted by

taesiri

Kimi Linear: An Expressive, Efficient Attention Architecture

moonshotai

Submitted by

taesiri

Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

·
3 authors

Submitted by

hsshin98

Exploring Conditions for Diffusion models in Robotic Control

naver-ai

1

Submitted by

ShengnanAn

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

meituan-longcat

Submitted by

hamza-hcompany

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

Hcompany

1

Submitted by

ZrrSkywalker

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

CUHK-CSE

The Chinese University of Hong Kong

Submitted by

wruisi

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

NanyangTechnologicalUniversity

Nanyang Technological University

Submitted by

alexhsu

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

google

Submitted by

CZWin32768

The Era of Agentic Organization: Learning to Organize with Language Models

·
7 authors

1

Submitted by

KevinHuang

OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes

hkuhk

The University of Hong Kong

Submitted by

nicolas-dufour

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

·
5 authors

1

Submitted by

chaoyi-wu

EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis

SJTU

Shanghai Jiao Tong University

Submitted by

khr0516

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

·
7 authors

Submitted by

akshaynambi

Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

MicrosoftResearch

Microsoft Research

Submitted by

xk-huang

MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

UCSC-VLAA

Submitted by

taesiri

Remote Labor Index: Measuring AI Automation of Remote Work

·
47 authors

Submitted by

taesiri

Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing

·
9 authors

Submitted by

taesiri

CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark

Submitted by

dscdyc

FullPart: Generating each 3D Part at Full Resolution

·
13 authors

Submitted by

harliwu

PORTool: Tool-Use LLM Training with Rewarded Tree

apple

1

Submitted by

acharkq

EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation

NationalUniversityofSingapore

National University of Singapore

Submitted by

alessandrobondielli

CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs

colinglab

CoLingLab | Computational Linguistics Laboratory - University of Pisa

1

Submitted by

JJ-TMT

CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

·
6 authors

Submitted by

jtlicardo

Performance Trade-offs of Optimizing Small Language Models for E-Commerce

·
2 authors

Submitted by

fangwu97

L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

·
7 authors

Submitted by

zhoutianyi

ChartAB: A Benchmark for Chart Grounding & Dense Alignment

UMCP

University of Maryland College Park

Submitted by

shikhar7ssu

POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

cmu-lti