Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! π€― Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€ 1οΈβ£ Generate and cache image features for each frame 2οΈβ£ Create a list of embeddings for selected patch(es) 3οΈβ£ Compute cosine similarity between each patch and the selected patch(es) 4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! π€― π£οΈ Transcribe videos, meeting notes, songs and more π Runs on-device, meaning no data is sent to a server π Multilingual (8 languages) π€ Completely free (forever) & open source
That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! π₯
NEW: Real-time conversational AI models can now run 100% locally in your browser! π€―
π Privacy by design (no data leaves your device) π° Completely free... forever π¦ Zero installation required, just visit a website β‘οΈ Blazingly-fast WebGPU-accelerated inference
For those interested, here's how it works: - Silero VAD for voice activity detection - Whisper for speech recognition - SmolLM2-1.7B for text generation - Kokoro for text to speech
Powered by Transformers.js and ONNX Runtime Web! π€ I hope you like it!
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! π₯
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. π€― A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! π€
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! π€―
Well, with π€ Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! β‘οΈ
Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.
The most difficult part was getting the model running in the first place, but the next steps are simple: βοΈ Implement sentence splitting, allowing for streamed responses π Multilingual support (only phonemization left)