sota - a leonardlin Collection

leonardlin 's Collections

8b-class-japanese-models

prompt injection

sota

updated May 24, 2024

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 53
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 49
An In-depth Look at Gemini's Language Abilities

Paper • 2312.11444 • Published Dec 18, 2023 • 1
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape

Paper • 2312.10868 • Published Dec 18, 2023 • 1
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

Paper • 2312.17661 • Published Dec 29, 2023 • 15
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 58
TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 90
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 74
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82
Towards Conversational Diagnostic AI

Paper • 2401.05654 • Published Jan 11, 2024 • 20
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 47
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Paper • 2401.15071 • Published Jan 26, 2024 • 37
Language Models can be Logical Solvers

Paper • 2311.06158 • Published Nov 10, 2023 • 20
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 142
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5, 2024 • 7
Matryoshka Representation Learning

Paper • 2205.13147 • Published May 26, 2022 • 25
Not all layers are equally as important: Every Layer Counts BERT

Paper • 2311.02265 • Published Nov 3, 2023 • 1
An Interactive Agent Foundation Model

Paper • 2402.05929 • Published Feb 8, 2024 • 29
Advancing State of the Art in Language Modeling

Paper • 2312.03735 • Published Nov 28, 2023 • 1
Large Language Models: A Survey

Paper • 2402.06196 • Published Feb 9, 2024 • 4
ChemLLM: A Chemical Large Language Model

Paper • 2402.06852 • Published Feb 10, 2024 • 32
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 46
Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7, 2024 • 69
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Paper • 2401.02731 • Published Jan 5, 2024 • 3
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 34
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 40
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33
Observational Scaling Laws and the Predictability of Language Model Performance

Paper • 2405.10938 • Published May 17, 2024 • 14