Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Zhongzhi Yu's picture

2 7 1

Zhongzhi Yu

kevin1020

VictorZheng's profile picture

Sherlor-Ming's profile picture

Chaojian's profile picture

·

yuzz1020

AI & ML interests

Efficient LLM Inference and Tuning

Organizations

kevin1020 's collections 19

Reasoning to Learn from Latent Thoughts

Paper • 2503.18866 • Published Mar 24 • 13

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Paper • 2401.12954 • Published Jan 23, 2024 • 33

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment

Paper • 2401.12474 • Published Jan 23, 2024 • 36
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3, 2024 • 57
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 37
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 118

Efficient Tuning

Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22

Efficient VLM via Image Token Compression

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Paper • 2403.06764 • Published Mar 11, 2024 • 28
TokenPacker: Efficient Visual Projector for Multimodal LLM

Paper • 2407.02392 • Published Jul 2, 2024 • 24
Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

Paper • 2411.05222 • Published Nov 7, 2024 • 2

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published Apr 30, 2024 • 34
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Paper • 2407.08454 • Published Jul 11, 2024
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2, 2024 • 27
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models

Paper • 2409.02076 • Published Sep 3, 2024 • 12

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 48
RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15 • 15

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 46

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 30

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23, 2024 • 29
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29
LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published Feb 21 • 31

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 78
REST: Retrieval-Based Speculative Decoding

Paper • 2311.08252 • Published Nov 14, 2023
Active Retrieval Augmented Generation

Paper • 2305.06983 • Published May 11, 2023 • 3
Retrieval-Augmented Generation for Large Language Models: A Survey

Paper • 2312.10997 • Published Dec 18, 2023 • 12

Inference Acceleration

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

Paper • 2401.12522 • Published Jan 23, 2024 • 12
Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5, 2024 • 17

Code Generation

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Paper • 2402.01391 • Published Feb 2, 2024 • 43
Code Representation Learning At Scale

Paper • 2402.01935 • Published Feb 2, 2024 • 13
Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Paper • 2406.11612 • Published Jun 17, 2024 • 25
Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 63

Token Compression

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 25
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

Paper • 2403.07750 • Published Mar 12, 2024 • 24

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19, 2024 • 11
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 36
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

Paper • 2405.14129 • Published May 23, 2024 • 14

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 41
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Paper • 2406.06592 • Published Jun 5, 2024 • 29
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29

TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 31

FasterViT: Fast Vision Transformers with Hierarchical Attention

Paper • 2306.06189 • Published Jun 9, 2023 • 31

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26
Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 54

Reasoning to Learn from Latent Thoughts

Paper • 2503.18866 • Published Mar 24 • 13

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 78
REST: Retrieval-Based Speculative Decoding

Paper • 2311.08252 • Published Nov 14, 2023
Active Retrieval Augmented Generation

Paper • 2305.06983 • Published May 11, 2023 • 3
Retrieval-Augmented Generation for Large Language Models: A Survey

Paper • 2312.10997 • Published Dec 18, 2023 • 12

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Paper • 2401.12954 • Published Jan 23, 2024 • 33

Inference Acceleration

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

Paper • 2401.12522 • Published Jan 23, 2024 • 12
Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5, 2024 • 17

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment

Paper • 2401.12474 • Published Jan 23, 2024 • 36
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3, 2024 • 57
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 37
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 118

Code Generation

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Paper • 2402.01391 • Published Feb 2, 2024 • 43
Code Representation Learning At Scale

Paper • 2402.01935 • Published Feb 2, 2024 • 13
Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Paper • 2406.11612 • Published Jun 17, 2024 • 25
Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 63

Efficient Tuning

Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22

Token Compression

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 25
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

Paper • 2403.07750 • Published Mar 12, 2024 • 24

Efficient VLM via Image Token Compression

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Paper • 2403.06764 • Published Mar 11, 2024 • 28
TokenPacker: Efficient Visual Projector for Multimodal LLM

Paper • 2407.02392 • Published Jul 2, 2024 • 24
Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

Paper • 2411.05222 • Published Nov 7, 2024 • 2

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19, 2024 • 11
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 36
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

Paper • 2405.14129 • Published May 23, 2024 • 14

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published Apr 30, 2024 • 34
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Paper • 2407.08454 • Published Jul 11, 2024
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2, 2024 • 27
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models

Paper • 2409.02076 • Published Sep 3, 2024 • 12

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 41
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Paper • 2406.06592 • Published Jun 5, 2024 • 29
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 48
RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15 • 15

TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 31

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 46

FasterViT: Fast Vision Transformers with Hierarchical Attention

Paper • 2306.06189 • Published Jun 9, 2023 • 31

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 30

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26
Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 54

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23, 2024 • 29
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29
LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published Feb 21 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs