- 
	
	
	
Tracking Anything with Decoupled Video Segmentation
Paper • 2309.03903 • Published • 28 - 
	
	
	
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Paper • 2312.16457 • Published • 15 - 
	
	
	
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Paper • 2312.15770 • Published • 15 
william cody stanford
williamcstanford
		AI & ML interests
None yet
		
		Organizations
None yet
RL
			
			
	
	LLMs
			
			
	
	- 
	
	
	
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 - 
	
	
	
Perspectives on the State and Future of Deep Learning - 2023
Paper • 2312.09323 • Published • 8 - 
	
	
	
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 - 
	
	
	
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 19 
Autonomous agents 
			
			
	
	- 
	
	
	
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 32 - 
	
	
	
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 - 
	
	
	
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 97 - 
	
	
	
LLM Agent Operating System
Paper • 2403.16971 • Published • 72 
Music gen
			
			
	
	brain
			
			
	
	relighting
			
			
	
	Depth Estimation
			
			
	
	Code Understanding
			
			
	
	diffusion
			
			
	
	- 
	
	
	
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 50 - 
	
	
	
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 - 
	
	
	
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 43 - 
	
	
	
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Paper • 2311.04145 • Published • 35 
robotics
			
			
	
	video gen
			
			
	
	- 
	
	
	
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper • 2401.04468 • Published • 49 - 
	
	
	
Anything in Any Scene: Photorealistic Video Object Insertion
Paper • 2401.17509 • Published • 17 - 
	
	
	
Memory Consolidation Enables Long-Context Video Understanding
Paper • 2402.05861 • Published • 10 - 
	
	
	
Magic-Me: Identity-Specific Video Customized Diffusion
Paper • 2402.09368 • Published • 30 
Transformer improvements
			
			
	
	video understanding
			
			
	
	- 
	
	
	
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 - 
	
	
	
Sora Generates Videos with Stunning Geometrical Consistency
Paper • 2402.17403 • Published • 18 - 
	
	
	
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 - 
	
	
	
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Paper • 2406.16338 • Published • 26 
MUST FOLLOWS
			
			
	
	- 
	
	
	
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 13 - 
	
	
	
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 - 
	
	
	
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper • 2402.11929 • Published • 11 - 
	
	
	
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Paper • 2403.14773 • Published • 11 
singing portraits
			
			
	
	Cellular Automata DL 
			
			
	
	datasets
			
			
	
	video segmentation
			
			
	
	- 
	
	
	
Tracking Anything with Decoupled Video Segmentation
Paper • 2309.03903 • Published • 28 - 
	
	
	
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Paper • 2312.16457 • Published • 15 - 
	
	
	
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Paper • 2312.15770 • Published • 15 
diffusion
			
			
	
	- 
	
	
	
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 50 - 
	
	
	
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 - 
	
	
	
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 43 - 
	
	
	
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Paper • 2311.04145 • Published • 35 
RL
			
			
	
	robotics
			
			
	
	LLMs
			
			
	
	- 
	
	
	
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 - 
	
	
	
Perspectives on the State and Future of Deep Learning - 2023
Paper • 2312.09323 • Published • 8 - 
	
	
	
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 - 
	
	
	
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 19 
video gen
			
			
	
	- 
	
	
	
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper • 2401.04468 • Published • 49 - 
	
	
	
Anything in Any Scene: Photorealistic Video Object Insertion
Paper • 2401.17509 • Published • 17 - 
	
	
	
Memory Consolidation Enables Long-Context Video Understanding
Paper • 2402.05861 • Published • 10 - 
	
	
	
Magic-Me: Identity-Specific Video Customized Diffusion
Paper • 2402.09368 • Published • 30 
Autonomous agents 
			
			
	
	- 
	
	
	
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 32 - 
	
	
	
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 - 
	
	
	
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 97 - 
	
	
	
LLM Agent Operating System
Paper • 2403.16971 • Published • 72 
Transformer improvements
			
			
	
	Music gen
			
			
	
	video understanding
			
			
	
	- 
	
	
	
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 - 
	
	
	
Sora Generates Videos with Stunning Geometrical Consistency
Paper • 2402.17403 • Published • 18 - 
	
	
	
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 - 
	
	
	
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Paper • 2406.16338 • Published • 26 
brain
			
			
	
	MUST FOLLOWS
			
			
	
	- 
	
	
	
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 13 - 
	
	
	
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 - 
	
	
	
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper • 2402.11929 • Published • 11 - 
	
	
	
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Paper • 2403.14773 • Published • 11 
relighting
			
			
	
	singing portraits
			
			
	
	Depth Estimation
			
			
	
	Cellular Automata DL 
			
			
	
	Code Understanding
			
			
	
	datasets