leonardlin 's Collections sota
updated
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published • 53
Paper
• 2309.16609
• Published • 38
Paper
• 2303.08774
• Published • 7
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 49
An In-depth Look at Gemini's Language Abilities
Paper
• 2312.11444
• Published • 1
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
Generative Artificial Intelligence (AI) Research Landscape
Paper
• 2312.10868
• Published • 1
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language
Models
Paper
• 2312.17661
• Published • 15
Paper
• 2310.06825
• Published • 58
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published • 95
Textbooks Are All You Need II: phi-1.5 technical report
Paper
• 2309.05463
• Published • 90
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 154
Paper
• 2401.04088
• Published • 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published • 74
Magicoder: Source Code Is All You Need
Paper
• 2312.02120
• Published • 82
Towards Conversational Diagnostic AI
Paper
• 2401.05654
• Published • 20
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
• 2401.13601
• Published • 47
MambaByte: Token-free Selective State Space Model
Paper
• 2401.13660
• Published • 59
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on
Generalizability, Trustworthiness and Causality through Four Modalities
Paper
• 2401.15071
• Published • 37
Language Models can be Logical Solvers
Paper
• 2311.06158
• Published • 20
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published • 85
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published • 142
BlackMamba: Mixture of Experts for State-Space Models
Paper
• 2402.01771
• Published • 25
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity
Text Embeddings Through Self-Knowledge Distillation
Paper
• 2402.03216
• Published • 7
Matryoshka Representation Learning
Paper
• 2205.13147
• Published • 25
Not all layers are equally as important: Every Layer Counts BERT
Paper
• 2311.02265
• Published • 1
An Interactive Agent Foundation Model
Paper
• 2402.05929
• Published • 29
Advancing State of the Art in Language Modeling
Paper
• 2312.03735
• Published • 1
Large Language Models: A Survey
Paper
• 2402.06196
• Published • 4
ChemLLM: A Chemical Large Language Model
Paper
• 2402.06852
• Published • 32
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
• 2402.07456
• Published • 46
Grandmaster-Level Chess Without Search
Paper
• 2402.04494
• Published • 69
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts
for Instruction Tuning on General Tasks
Paper
• 2401.02731
• Published • 3
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published • 134
Yi: Open Foundation Models by 01.AI
Paper
• 2403.04652
• Published • 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published • 129
InternLM2 Technical Report
Paper
• 2403.17297
• Published • 34
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published • 40
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published • 157
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
• 2405.12981
• Published • 33
Observational Scaling Laws and the Predictability of Language Model
Performance
Paper
• 2405.10938
• Published • 14