Jędrzej Grabala

jgitsolutions

https://jgitsol.github.io

AI & ML interests

Local Drive Human Overseered System of Agents, LLMs, Langchains & other useful stuff on mid-to-low-end of commercial hardware.

Recent Activity

liked a Space 2 days ago

tori29umai/Qwen-Image-2509-MultipleAngles

reacted to salma-remyx's post with 👍 about 2 months ago

Reproducing research code shouldn't take longer than reading the paper. For papers that include code, setting up the right environment often means hours of dependency hell and configuration debugging. At Remyx AI, we built an agent that automatically creates and tests Docker images for research papers, then shares them publicly so anyone can reproduce results with a single command. We just submitted PR #908 to integrate this directly into arXiv Labs. If you believe in making reproducible research accessible to everyone, give it a bump!: https://github.com/arXiv/arxiv-browse/pull/908

liked a Space 5 months ago

nvidia/PartPacker

View all activity

Organizations

liked a Space 2 days ago

366

Qwen-Image-2509-MultipleAngles

👀

Qwen-Image-2509-MultipleAngles

reacted to salma-remyx's post with 👍 about 2 months ago

Post

3242

Reproducing research code shouldn't take longer than reading the paper.
For papers that include code, setting up the right environment often means hours of dependency hell and configuration debugging.

At Remyx AI, we built an agent that automatically creates and tests Docker images for research papers, then shares them publicly so anyone can reproduce results with a single command.

We just submitted PR #908 to integrate this directly into arXiv Labs.

If you believe in making reproducible research accessible to everyone, give it a bump!: https://github.com/arXiv/arxiv-browse/pull/908

3 replies

liked 3 Spaces 5 months ago

288

PartPacker

🪴

Part-level image-to-3D generation.

487

Song Generation

🎵

Generate a custom song from lyrics and optional prompts

1.53k

Sparc3D

🏃

Next-Gen High-Resolution 3D Model Generation

liked 4 Spaces 6 months ago

15.8k

DeepSite v3

🐳

Generate any application by Vibe Coding

120

QwenSite

⚛

Generate any application with Qwen

6.67k

MTEB Leaderboard

🥇

Embedding Leaderboard

1.13k

Open ASR Leaderboard

🏆

Display and request speech recognition model benchmarks

reacted to fdaudens's post with 🔥 6 months ago

Post

3211

Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

1 reply

liked a Space 6 months ago

662

ICEdit

🖼

Universal Image Editing is worth a single LoRA

reacted to RiverZ's post with 👀 6 months ago

Post

3655

🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer～

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO:
RiverZ/ICEdit

🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface:
sanaka87/ICEdit-MoE-LoRA

📄 arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!