-
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3 -
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
Paper • 2601.05732 • Published • 1 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 307
Collections
Discover the best community collections!
Collections including paper arxiv:2401.04088
-
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 58 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Paper • 2407.01906 • Published • 46 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 59 -
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 160 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 15 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 43 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
-
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3 -
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
Paper • 2601.05732 • Published • 1 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 307
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 15 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 58 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 43 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Paper • 2407.01906 • Published • 46 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 59 -
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 160 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14