Gabriele Santilli's picture

1 38

Gabriele Santilli

giesse

·

giesse

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Less is More: Recursive Reasoning with Tiny Networks

upvoted a paper about 2 months ago

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

upvoted a paper 5 months ago

Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models

View all activity

Organizations

None yet

upvoted 2 papers about 2 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 485

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30 • 532

upvoted 2 papers 5 months ago

Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models

Paper • 2502.20332 • Published Feb 27 • 2

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Paper • 2506.14761 • Published Jun 17 • 17

upvoted 5 papers 6 months ago

Continuous Thought Machines

Paper • 2505.05522 • Published May 8 • 11

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published Jun 5 • 56

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21 • 54

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Paper • 2505.21600 • Published May 27 • 71

Alchemist: Turning Public Text-to-Image Data into Generative Gold

Paper • 2505.19297 • Published May 25 • 84

upvoted a paper 7 months ago

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Paper • 2504.17025 • Published Apr 23 • 17

upvoted a paper 10 months ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28

upvoted 9 papers over 1 year ago

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Paper • 2408.02555 • Published Aug 5, 2024 • 32

Open-Vocabulary Audio-Visual Semantic Segmentation

Paper • 2407.21721 • Published Jul 31, 2024 • 9

Matting by Generation

Paper • 2407.21017 • Published Jul 30, 2024 • 24

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24

Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19, 2024 • 18

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Paper • 2406.13897 • Published May 30, 2024 • 12

GAVEL: Generating Games Via Evolution and Language Models

Paper • 2407.09388 • Published Jul 12, 2024 • 17

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Paper • 2407.09732 • Published Jul 13, 2024 • 10