Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.21809

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

about 21 hours ago

FlashWorld: High-quality 3D Scene Generation within Seconds

Paper • 2510.13678 • Published 11 days ago • 69
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Paper • 2510.15019 • Published 10 days ago • 55
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction

Paper • 2509.18090 • Published Sep 22 • 2
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Paper • 2509.19296 • Published Sep 23 • 22

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Paper • 2507.06165 • Published Jul 8 • 58
DINOv3

Paper • 2508.10104 • Published Aug 13 • 274
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 258

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131
Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23 • 85
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17 • 56
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

Paper • 2506.05010 • Published Jun 5 • 79

From One to More: Contextual Part Latents for 3D Generation

Paper • 2507.08772 • Published Jul 11 • 25
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Paper • 2507.06165 • Published Jul 8 • 58
SeqTex: Generate Mesh Textures in Video Sequence

Paper • 2507.04285 • Published Jul 6 • 9
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Paper • 2507.17745 • Published Jul 23 • 34

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Paper • 2401.09416 • Published Jan 17, 2024 • 11
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Paper • 2401.10171 • Published Jan 18, 2024 • 14
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22
GALA: Generating Animatable Layered Assets from a Single Scan

Paper • 2401.12979 • Published Jan 23, 2024 • 9

This collection is a list of papers I find to be very interesting.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 306
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 208

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published Jun 9 • 50
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published Aug 13 • 68
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Paper • 2503.01710 • Published Mar 3 • 6
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131

tencent/HunyuanWorld-Mirror

Image-to-3D • Updated 1 day ago • 8.31k • 346
tencent/Hunyuan3D-Part

Updated 10 days ago • 4.78k • 498
tencent/Hunyuan3D-2mini

Image-to-3D • Updated 10 days ago • 5.4k • 99
tencent/HunyuanWorld-Voyager

Image-to-Video • Updated 10 days ago • 165 • 580

Scene Generation

HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation

Paper • 2504.21650 • Published Apr 30 • 16
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Paper • 2505.02836 • Published May 5 • 8
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies

Paper • 2506.14315 • Published Jun 17 • 10
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Paper • 2401.09416 • Published Jan 17, 2024 • 11
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Paper • 2401.10171 • Published Jan 18, 2024 • 14
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22
GALA: Generating Animatable Layered Assets from a Single Scan

Paper • 2401.12979 • Published Jan 23, 2024 • 9

about 21 hours ago

FlashWorld: High-quality 3D Scene Generation within Seconds

Paper • 2510.13678 • Published 11 days ago • 69
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Paper • 2510.15019 • Published 10 days ago • 55
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction

Paper • 2509.18090 • Published Sep 22 • 2
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Paper • 2509.19296 • Published Sep 23 • 22

This collection is a list of papers I find to be very interesting.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 306
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 208

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Paper • 2507.06165 • Published Jul 8 • 58
DINOv3

Paper • 2508.10104 • Published Aug 13 • 274
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 258

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published Jun 9 • 50
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published Aug 13 • 68
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Paper • 2503.01710 • Published Mar 3 • 6
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131
Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23 • 85
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17 • 56
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

Paper • 2506.05010 • Published Jun 5 • 79

tencent/HunyuanWorld-Mirror

Image-to-3D • Updated 1 day ago • 8.31k • 346
tencent/Hunyuan3D-Part

Updated 10 days ago • 4.78k • 498
tencent/Hunyuan3D-2mini

Image-to-3D • Updated 10 days ago • 5.4k • 99
tencent/HunyuanWorld-Voyager

Image-to-Video • Updated 10 days ago • 165 • 580

From One to More: Contextual Part Latents for 3D Generation

Paper • 2507.08772 • Published Jul 11 • 25
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Paper • 2507.06165 • Published Jul 8 • 58
SeqTex: Generate Mesh Textures in Video Sequence

Paper • 2507.04285 • Published Jul 6 • 9
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Paper • 2507.17745 • Published Jul 23 • 34

Scene Generation

HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation

Paper • 2504.21650 • Published Apr 30 • 16
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Paper • 2505.02836 • Published May 5 • 8
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies

Paper • 2506.14315 • Published Jun 17 • 10
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 131

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs