Zhongzhi Yu
kevin1020
		AI & ML interests
Efficient LLM Inference and Tuning
		
		Organizations
Prompting
			
			
	
	LLM Agents
			
			
	
	- 
	
	
	Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-AlignmentPaper • 2401.12474 • Published • 36
- 
	
	
	More Agents Is All You NeedPaper • 2402.05120 • Published • 57
- 
	
	
	VideoAgent: Long-form Video Understanding with Large Language Model as AgentPaper • 2403.10517 • Published • 37
- 
	
	
	Octopus v4: Graph of language modelsPaper • 2404.19296 • Published • 118
Efficient Tuning
			
			
	
	Efficient VLM via Image Token Compression
			
			
	
	- 
	
	
	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language ModelsPaper • 2403.06764 • Published • 28
- 
	
	
	TokenPacker: Efficient Visual Projector for Multimodal LLMPaper • 2407.02392 • Published • 24
- 
	
	
	Efficient Inference of Vision Instruction-Following Models with Elastic CachePaper • 2407.18121 • Published • 17
- 
	
	
	Don't Look Twice: Faster Video Transformers with Run-Length TokenizationPaper • 2411.05222 • Published • 2
Long Context
			
			
	
	- 
	
	
	Extending Llama-3's Context Ten-Fold OvernightPaper • 2404.19553 • Published • 34
- 
	
	
	Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context TasksPaper • 2407.08454 • Published
- 
	
	
	VideoLLaMB: Long-context Video Understanding with Recurrent Memory BridgesPaper • 2409.01071 • Published • 27
- 
	
	
	Spinning the Golden Thread: Benchmarking Long-Form Generation in Language ModelsPaper • 2409.02076 • Published • 12
Visualizations
			
			
	
	- 
	
	
	Not All Language Model Features Are LinearPaper • 2405.14860 • Published • 41
- 
	
	
	LLMs Know More Than They Show: On the Intrinsic Representation of LLM HallucinationsPaper • 2410.02707 • Published • 48
- 
	
	
	RepVideo: Rethinking Cross-Layer Representation for Video GenerationPaper • 2501.08994 • Published • 15
PEFT
			
			
	
	Modular
			
			
	
	Efficient LLM
			
			
	
	RAG
			
			
	
	- 
	
	
	Self-RAG: Learning to Retrieve, Generate, and Critique through Self-ReflectionPaper • 2310.11511 • Published • 78
- 
	
	
	REST: Retrieval-Based Speculative DecodingPaper • 2311.08252 • Published
- 
	
	
	Active Retrieval Augmented GenerationPaper • 2305.06983 • Published • 3
- 
	
	
	Retrieval-Augmented Generation for Large Language Models: A SurveyPaper • 2312.10997 • Published • 12
Inference Acceleration
			
			
	
	- 
	
	
	BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language ModelsPaper • 2401.12522 • Published • 12
- 
	
	
	Hydragen: High-Throughput LLM Inference with Shared PrefixesPaper • 2402.05099 • Published • 20
- 
	
	
	BiLLM: Pushing the Limit of Post-Training Quantization for LLMsPaper • 2402.04291 • Published • 50
- 
	
	
	Shortened LLaMA: A Simple Depth Pruning for Large Language ModelsPaper • 2402.02834 • Published • 17
Code Generation
			
			
	
	- 
	
	
	StepCoder: Improve Code Generation with Reinforcement Learning from Compiler FeedbackPaper • 2402.01391 • Published • 43
- 
	
	
	Code Representation Learning At ScalePaper • 2402.01935 • Published • 13
- 
	
	
	Long Code Arena: a Set of Benchmarks for Long-Context Code ModelsPaper • 2406.11612 • Published • 25
- 
	
	
	Agentless: Demystifying LLM-based Software Engineering AgentsPaper • 2407.01489 • Published • 63
Token Compression
			
			
	
	VLM
			
			
	
	- 
	
	
	Chart-based Reasoning: Transferring Capabilities from LLMs to VLMsPaper • 2403.12596 • Published • 11
- 
	
	
	Groma: Localized Visual Tokenization for Grounding Multimodal Large Language ModelsPaper • 2404.13013 • Published • 31
- 
	
	
	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningPaper • 2404.16994 • Published • 36
- 
	
	
	AlignGPT: Multi-modal Large Language Models with Adaptive Alignment CapabilityPaper • 2405.14129 • Published • 14
Reasoning
			
			
	
	- 
	
	
	DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataPaper • 2405.14333 • Published • 41
- 
	
	
	Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingPaper • 2404.12253 • Published • 55
- 
	
	
	Improve Mathematical Reasoning in Language Models by Automated Process SupervisionPaper • 2406.06592 • Published • 29
- 
	
	
	Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BPaper • 2406.07394 • Published • 29
Forward tuning
			
			
	
	ViT
			
			
	
	Benchmarks
			
			
	
	Data
			
			
	
	RAG
			
			
	
	- 
	
	
	Self-RAG: Learning to Retrieve, Generate, and Critique through Self-ReflectionPaper • 2310.11511 • Published • 78
- 
	
	
	REST: Retrieval-Based Speculative DecodingPaper • 2311.08252 • Published
- 
	
	
	Active Retrieval Augmented GenerationPaper • 2305.06983 • Published • 3
- 
	
	
	Retrieval-Augmented Generation for Large Language Models: A SurveyPaper • 2312.10997 • Published • 12
Prompting
			
			
	
	Inference Acceleration
			
			
	
	- 
	
	
	BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language ModelsPaper • 2401.12522 • Published • 12
- 
	
	
	Hydragen: High-Throughput LLM Inference with Shared PrefixesPaper • 2402.05099 • Published • 20
- 
	
	
	BiLLM: Pushing the Limit of Post-Training Quantization for LLMsPaper • 2402.04291 • Published • 50
- 
	
	
	Shortened LLaMA: A Simple Depth Pruning for Large Language ModelsPaper • 2402.02834 • Published • 17
LLM Agents
			
			
	
	- 
	
	
	Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-AlignmentPaper • 2401.12474 • Published • 36
- 
	
	
	More Agents Is All You NeedPaper • 2402.05120 • Published • 57
- 
	
	
	VideoAgent: Long-form Video Understanding with Large Language Model as AgentPaper • 2403.10517 • Published • 37
- 
	
	
	Octopus v4: Graph of language modelsPaper • 2404.19296 • Published • 118
Code Generation
			
			
	
	- 
	
	
	StepCoder: Improve Code Generation with Reinforcement Learning from Compiler FeedbackPaper • 2402.01391 • Published • 43
- 
	
	
	Code Representation Learning At ScalePaper • 2402.01935 • Published • 13
- 
	
	
	Long Code Arena: a Set of Benchmarks for Long-Context Code ModelsPaper • 2406.11612 • Published • 25
- 
	
	
	Agentless: Demystifying LLM-based Software Engineering AgentsPaper • 2407.01489 • Published • 63
Efficient Tuning
			
			
	
	Token Compression
			
			
	
	Efficient VLM via Image Token Compression
			
			
	
	- 
	
	
	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language ModelsPaper • 2403.06764 • Published • 28
- 
	
	
	TokenPacker: Efficient Visual Projector for Multimodal LLMPaper • 2407.02392 • Published • 24
- 
	
	
	Efficient Inference of Vision Instruction-Following Models with Elastic CachePaper • 2407.18121 • Published • 17
- 
	
	
	Don't Look Twice: Faster Video Transformers with Run-Length TokenizationPaper • 2411.05222 • Published • 2
VLM
			
			
	
	- 
	
	
	Chart-based Reasoning: Transferring Capabilities from LLMs to VLMsPaper • 2403.12596 • Published • 11
- 
	
	
	Groma: Localized Visual Tokenization for Grounding Multimodal Large Language ModelsPaper • 2404.13013 • Published • 31
- 
	
	
	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningPaper • 2404.16994 • Published • 36
- 
	
	
	AlignGPT: Multi-modal Large Language Models with Adaptive Alignment CapabilityPaper • 2405.14129 • Published • 14
Long Context
			
			
	
	- 
	
	
	Extending Llama-3's Context Ten-Fold OvernightPaper • 2404.19553 • Published • 34
- 
	
	
	Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context TasksPaper • 2407.08454 • Published
- 
	
	
	VideoLLaMB: Long-context Video Understanding with Recurrent Memory BridgesPaper • 2409.01071 • Published • 27
- 
	
	
	Spinning the Golden Thread: Benchmarking Long-Form Generation in Language ModelsPaper • 2409.02076 • Published • 12
Reasoning
			
			
	
	- 
	
	
	DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataPaper • 2405.14333 • Published • 41
- 
	
	
	Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingPaper • 2404.12253 • Published • 55
- 
	
	
	Improve Mathematical Reasoning in Language Models by Automated Process SupervisionPaper • 2406.06592 • Published • 29
- 
	
	
	Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BPaper • 2406.07394 • Published • 29
Visualizations
			
			
	
	- 
	
	
	Not All Language Model Features Are LinearPaper • 2405.14860 • Published • 41
- 
	
	
	LLMs Know More Than They Show: On the Intrinsic Representation of LLM HallucinationsPaper • 2410.02707 • Published • 48
- 
	
	
	RepVideo: Rethinking Cross-Layer Representation for Video GenerationPaper • 2501.08994 • Published • 15
Forward tuning
			
			
	
	PEFT
			
			
	
	ViT
			
			
	
	Modular
			
			
	
	Benchmarks
			
			
	
	Efficient LLM
			
			
	
	
