Sam
samsam55
·
AI & ML interests
None yet
Recent Activity
updated
a collection
10 days ago
3D Models & Modeling
updated
a collection
10 days ago
3D Models & Modeling
updated
a collection
12 days ago
3D Models & Modeling
Organizations
None yet
Run on CPU Optimizations
World View Creation (out painting 3D)
Coding LLMs
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 11 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 857k • • 1.25k -
thewh1teagle/phonikud
0.3B • Updated • 210 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 61
Agents
Self Improving
Deep Search
Computer Use
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 19 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 44 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 79
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 15 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 16 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 10 -
katanemo/Arch-Router-1.5B
Text Generation • 2B • Updated • 3.61k • • 216
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 5 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 8 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 62 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 43
Datasets
Self Improving
Run on CPU Optimizations
Deep Search
World View Creation (out painting 3D)
Computer Use
Coding LLMs
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 19 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 44 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 79
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 11 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 857k • • 1.25k -
thewh1teagle/phonikud
0.3B • Updated • 210 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 61
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 15 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 16 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 10 -
katanemo/Arch-Router-1.5B
Text Generation • 2B • Updated • 3.61k • • 216
Agents
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 5 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 8 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 62 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 43