Add pipeline tag: feature-extraction

This PR enhances the model card by adding the `pipeline_tag: feature-extraction` to the metadata. This accurately describes the model's core function of compressing text into continuous representations, making it more discoverable for users on the Hugging Face Hub who are looking for models performing text feature extraction.

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -1,13 +1,14 @@
 ---
-license: cc-by-4.0
 language:
 - en
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
- # ARC-Encoder models
  This page houses `ARC8-Encoder_Llama` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535). A code to reproduce the pretraining, further fine-tune the encoders or even evaluate them on dowstream tasks is available at [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder/tree/main).
@@ -15,7 +16,7 @@ tags:
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
 - `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
-- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
 - `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses

 ---
 language:
 - en
+license: cc-by-4.0
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
+pipeline_tag: feature-extraction
 ---
+# ARC-Encoder models
  This page houses `ARC8-Encoder_Llama` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535). A code to reproduce the pretraining, further fine-tune the encoders or even evaluate them on dowstream tasks is available at [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder/tree/main).
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
 - `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
+- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://www.mistralai.com/tech/mistral-7b/) base specifically with a pooling factor of 8.
 - `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses