John6666 (John Smith)

reacted to AbstractPhil's post with 🚀 about 20 hours ago

Post

201

Lyra, Lune, Cantor, k-simplex, and many relational experiments.
AbstractPhil/sd15-flow-matching-lune
Today I will be updating the space to support all three forms of lyra to enable tinkertoying with various other models like flux-schnell and sdxl.

It should be noted, I didn't know nvidia actually released a model named LYRA. This model has no association with NVIDIA's LYRA model. This LYRA is full MIT licensed. If necessary I'll rename this model, but I don't think it'll matter.

Unlike NORMAL VAE, this VAE was intentionally meant to introduce incorrectness into the correctness that already exists. The concept was to pull towards a goal - t5-xl being the primary goal.

AbstractPhil/vae-lyra Lyra is a multimodal MM-VAE prototype meant to encompass a fusion of multiple types of encodings together. Tested with circle of fifths audio and text, multiple text encoders, vision and text encoder, and a few other smaller prototypes that yielded.
Lyra has a few direct clip_l and t5_xl prototypes that directly learned to associate clip_l with t5-base. This version worked, so version 2 expanded the concept.

AbstractPhil/vae-lyra-sdxl-t5xl is another prototype using CLIP_L and CLIP_G fused with T5_XL for the first version, directly utilizing projection with minimal geometric and cantor assistance. The shared layers ended up teaching CLIP_L how to be CLIP_G and the output ended up warping too much for SDXL or SD15 to understand.

AbstractPhil/vae-lyra-xl-adaptive-cantor
Utilizing adapative cantor is the successful prototype where CLIP_L and CLIP_G learned independent structures internally, where CLIP_L and T5_XL learned a route with CLIP_G and T5_XL in parallel conjunction. This enabled two entirely divergent opinions, and thus enables the t5-xl to manipulate either the clip_l or the clip_g for models like FLUX-SCHNELL or SDXL.

Each lyra has a purpose, and each purpose matters.

reacted to DavidVivancos's post with 👍 about 20 hours ago

Post

167

Hi all!,

Neuraxon Game of Life is also live in demo at HuggingFace
DavidVivancos/NeuraxonLife

Preprint Paper: https://www.researchgate.net/publication/397331336_Neuraxon

Source Code of the Research verision: https://github.com/DavidVivancos/Neuraxon

HuggingFace Models are in the oven!

Hope you like it!
@DavidVivancos

reacted to badaoui's post with 🤗🔥 about 20 hours ago

Post

185

Building high-performance, reproducible kernels for AMD ROCm just got a lot easier.

I've put together a guide on building, testing, and sharing ROCm-compatible kernels using the Hugging Face kernel-builder and kernels libraries; so you can focus on optimizing performance rather than spending time on setup.

Learn how to:

- Use Nix for reproducible builds
- Integrate kernels as native PyTorch operators
- Share your kernels on the Hub for anyone to use with kernels.get_kernel()

We use the 🏆 award-winning RadeonFlow GEMM kernel as a practical example.

📜 Check out the full guide here : https://huggingface.co/blog/build-rocm-kernels

reacted to ZennyKenny's post with 🔥 about 20 hours ago

Post

1446

🎉 Wow. Congratulations @bfirsh and the Replicate team on the CloudFlare acquisition!

✌️ You've really built an incredible ecosystem and product offering and should be super proud.

reacted to cjerzak's post with 👀 about 20 hours ago

Post

1366

>>> We're writing a new book, <Planetary Causal Inference>, on how to model counterfactuals at planetary scale by combining satellite imagery + other global data with local studies and RCTs. Forthcoming in 2026+.
>>> Book info: https://planetarycausalinference.org/book-launch
>>> All datasets used in the book will be openly available on our lab’s Hugging Face hub:

theaidevlab

reacted to sergiopaniego's post with 🤗 about 20 hours ago

Post

373

Who wants a TRL sticker? 🙋

https://github.com/huggingface/trl

reacted to flozi00's post with 👍 about 20 hours ago

Post

1770

Running large language models efficiently is more than just raw GPU power. The latest guide breaks down the essential math to determine if your LLM workload is compute-bound or memory-bound.

We apply these principles to a real-world example: Qwen's 32B parameter model on the new NVIDIA RTX PRO 6000 Blackwell Edition.

In this guide, you will learn how to:

Calculate your GPU's operational intensity (Ops:Byte Ratio)
Determine your model's arithmetic intensity
Identify whether your workload is memory-bound or compute-bound

Read the full guide here: https://flozi.net/en/guides/ai/llm-inference-math

reacted to ronantakizawa's post with 👍 about 20 hours ago

Post

703

Introducing the Japanese Character Difficulty Dataset: a collection of 3,003 Japanese characters (Kanji) labeled with official educational difficulty grades. It includes elementary (grades 1–6), secondary (grade 8), and advanced (grade 9) characters, making it useful for language learning, text difficulty analysis, and educational tool development 🎉

ronantakizawa/japanese-character-difficulty

#japanese #kanji #japanesedataset

reacted to wang12390's post with 👀 about 20 hours ago

Post

124

Create Original Clothing Model Images with AI

Hey designers, this one’s for you. Transform ordinary portraits and clothing shots into custom fashion visuals that'll make your marketing materials absolutely sing. We're talking catalogs that convert. Branding that sticks. Social content that stops the scroll. Our AI-powered tool opens doors to creative ideas you didn’t even know were possible—and might not have planned for.
https://miragic.ai/products/virtual-try-on

reacted to branikita's post with 🔥🔥 about 20 hours ago

Post

1537

We tested the mechanical backlash of the Feetech STS3250 servo.
Using an 86 mm lever arm, we measured a tip displacement of 0.64 mm, which corresponds to an angular backlash of approximately 0.43°.

According to the datasheet, the maximum allowable backlash is 0.5°, so our measured value of 0.43° falls within the specified limit.

reacted to fd3ffff's post with 👀 about 20 hours ago

Post

151

Zootopia 2 is going to be released globally next week!!

I interacted with the main characters Judy and Nick using AI tools

Operation steps：
1. Open the https://imini.com/ text-to-image section
2. Enter prompt words: Present from a top-down perspective, maintaining the original facial image of the character unchanged. Q-version image, stand up, let the image look up. Stretch out a small hand and touch it with one index finger against the only human finger exposed outside the camera. The background is replaced with a grey grainy road surface that glows with water after the rain, and many yellow ginkgo leaves are scattered around. The overall color tone of the picture is a high-end atmosphere, highlighting a warm and interactive feel.
3. Click "One-click Generation".
4. Download and save or share with one click to social media platforms.

This is simply so cool!!

reacted to Kseniase's post with ❤️ 1 day ago

Post

5391

12 Types of JEPA

Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:

1. LeJEPA → LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the “ideal” JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA

2. JEPA-T → JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text

3. Text-JEPA → Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs

4. N-JEPA (Noise-based JEPA) → Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification

5. SparseJEPA → SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy

6. TS-JEPA (Time Series JEPA) → Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders

Read further below ↓
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

reacted to theainerd's post with 🔥 1 day ago

Post

4765

Hindi Speech to Text just crossed 20 million downloads. Grateful for everyone using it.

theainerd/Wav2Vec2-large-xlsr-hindi

reacted to grimjim's post with 🔥 2 days ago

Post

4870

Implemented a proof of concept sampler in pure PyTorch and transformers.

Max P consists of a dynamic token filter which applies Winsorization to cap the probabilties of top tokens. Specifically, a base probability in the range of [0,1] is used to cap individual token probability; the sampler then redistributes excess proportionally.

https://github.com/jim-plus/maxp-sampler-poc

Combined with Temperature and Min P, this could represent a more intuitive way of reducing repetition in text generation.

2 replies

·

reacted to prithivMLmods's post with 🤗 3 days ago

Post

3609

Made a small write up and experimental finetuning guide for MetaCLIP2 for Image Classification on Downstream Tasks. The blog titled Fine Tuning MetaCLIP 2 for Image Classification on Downstream Tasks demonstrates the step by step finetuning using CIFAR10 and is also flexible for adapting to other datasets. For more details, check out the linked blog below. 🤗↗️

⮞ Blog Article: https://huggingface.co/blog/prithivMLmods/metaclip2-downstream-finetune
⮞ Demo Space[Zero-Shot Classification]: prithivMLmods/metaclip-2-demo

Some other models
╰› MetaCLIP-2-Cifar10: prithivMLmods/MetaCLIP-2-Cifar10
╰› MetaCLIP-2-Age-Range-Estimator: prithivMLmods/MetaCLIP-2-Age-Range-Estimator
╰› MetaCLIP-2-Gender-Identifier: prithivMLmods/MetaCLIP-2-Gender-Identifier
╰› MetaCLIP-2-Open-Scene: prithivMLmods/MetaCLIP-2-Open-Scene

⮞ Collection: https://huggingface.co/collections/prithivMLmods/metaclip2-image-classification-experiments

To know more about it, visit the app page or the respective model page!

reacted to samerzaher80's post with 👍 3 days ago

Post

241

AetherMind-KD-Student is a 184M-parameter Natural Language Inference (NLI) model distilled from a DeBERTa-v3 teacher using a multi-stage, adversarial-aware knowledge distillation pipeline.
The model is designed to provide:

High accuracy on standard NLI benchmarks
Strong robustness on adversarial datasets
Excellent zero-shot generalization to unseen datasets
High inference efficiency on consumer GPUs

This makes it suitable for research and practical applications that require fast and reliable sentence-level reasoning.
samerzaher80/AetherMind-KD-Student

reacted to unmodeled-tyler's post with 👍 3 days ago

Post

281

NEW Research Preview Models ⚛️ :
vanta-research/atom-v1-preview-12b
vanta-research/atom-v1-preview-8b

Last week we released Atom V1 4B and Atom V1 8B, our first two preview releases in Project Atom from VANTA Research - a larger effort to refine and scale the Atom persona from 4B to 400B+.

Today we are excited to share Atom V1 12B! This preview release is built on Google's Gemma3 12B architecture, and fine tuned for exploratory and collaborative interaction.

Atom is intentionally trained to not simply be an informational source, but a partner in thought. As such, the model regularly and consistently returns thoughtful questions at the end of it's responses. This is designed not only to keep the interaction engaging, but to encourage deeper thought/exploration between the user and model.

As always, feedback is welcome as we continue to refine our approach to Project Atom, and human-AI collaboration in general.

We hope you enjoy!

reacted to aufklarer's post with 🔥 3 days ago

Post

3201

Fine-Tuning Qwen3 Embeddings for product category classification on the Large-Scale Product Corpus

Language-models such as GPT, Llama, DeepSeek, Qwen trained with a filtered slice of Common Crawl. For e-commerce work, though, we can start with the Web Data Commons (WDC), the project by the University of Mannheim. It extracts web pages that carry some metadata and publishes the result as the Large-Scale Product Corpus (LSPC).

Search engines like Google reward pages that include detailed product markup, so merchants already populate their sites with SEO-friendly fields such as title, brand, GTIN, price — and, crucially, category labels. Thanks to these built-in annotations, the WDC Large-Scale Product Corpus arrives almost fully self-labelled. I used those labels to fine-tune Qwen3 Embedding with Low-Rank Adaptation (LoRA), code is available on github. The resulting 615 million-parameter checkpoint fits comfortably in limited GPU memory yet updates the model’s representation space, mapping raw product titles to six top-level categories with a macro-F1 of 0.836 (83.6 %).

More details: https://blog.ivan.digital/fine-tuning-qwen3-embeddings-for-product-category-classification-on-the-large-scale-product-corpus-3a0919506bc8

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity