32 8 20

Daniël de Kok

danieldk

https://danieldk.eu/

AI & ML interests

Fast, low-level ML code, model implementation

Recent Activity

upvoted an article 1 day ago

NVIDIA Releases 8 Million Sample Open Dataset and Tooling for OCR, Image Reasoning, Image and Video QA Tasks

updated a model 1 day ago

kernels-community/paged-attention

updated a model 8 days ago

kernels-community/relu

View all activity

Organizations

posted an update 17 days ago

Post

428

We have released kernel-builder 0.7.0: https://github.com/huggingface/kernel-builder/releases/tag/v0.7.0

Headline features:

* 🔮 Supports building kernels for the brand-new PyTorch 2.9.0.
* 🪟 Experimental support for building Windows kernels.

posted an update 4 months ago

Post

2049

kernels 0.8.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.8.0

This release refines kernel selection in the kernelize function:

• You can now register kernels for certain CUDA capability ranges.
• Rather than doing exact mating of modes, fall back to other compatible modes. If you are kernelizing for inference, but you only registered a training + torch.compile kernel, it will use that kernel since it is compatible with inference as well.

1 reply

posted an update 4 months ago

Post

471

You can get flash-attention 3 ⚡️ directly from the hub now using kernels!

kernels-community/flash-attn3

posted an update 4 months ago

Post

377

Kernels 0.7.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.7.0 🚀

This release makes it possible to register multiple kernels for a layer. Do you have a super-fast kernel for inference and another kernel for training? Register them both and kernelize will pick the kernel depending on whether you are going to do training or inference.

reacted to pagezyhf's post with 🔥 4 months ago

Post

3249

Hackathons in Paris on July 5th and 6th!

Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.

Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.

Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!

2 replies

posted an update 5 months ago

Post

1890

We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0

1 reply

Daniël de Kok

AI & ML interests

Recent Activity

Organizations

danieldk's activity