Jina AI: Your Search Foundation, Supercharged!

jina-embeddings-v5-text-small-classification: Classification-Targeted Embedding Distillation

Elastic Inference Service | ArXiv | Release Note | Blog

Model Overview

jina-embeddings-v5-text Architecture

`jina-embeddings-v5-text-small-classification` is a compact, high-performance text embedding model designed for classification.

It is part of the jina-embeddings-v5-text model family, which also includes jina-embeddings-v5-text-nano, a smaller model for more resource-constrained use cases.

Trained using a novel approach that combines distillation with task-specific contrastive losses, jina-embeddings-v5-text-small-classification outperforms existing state-of-the-art models of similar size across diverse embedding benchmarks.

Feature Value
Parameters 677M
Supported Tasks classification
Max Sequence Length 32768
Embedding Dimension 1024
Matryoshka Dimensions 32, 64, 128, 256, 512, 768, 1024
Pooling Strategy Last-token pooling
Base Model jinaai/jina-embeddings-v5-text-small

v5_benchmarks_combined

Training and Evaluation

For training details and evaluation results, see our technical report.

Usage

Requirements

The following Python packages are required:

  • transformers>=5.1.0
  • torch>=2.8.0
  • peft>=0.15.2
  • vllm>=0.15.1

Optional / Recommended

  • flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
  • sentence-transformers: If you want to use the model via the sentence-transformers interface, install this package as well.
via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT _inference/text_embedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-small"
  }
}

See the Elastic Inference Service documentation for setup details.

via sentence-transformers
from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
    "jinaai/jina-embeddings-v5-text-small-classification",
    model_kwargs={"dtype": torch.bfloat16},  # Recommended for GPUs
    config_kwargs={"_attn_implementation": "flash_attention_2"},  # Recommended but optional
)
# Optional: set truncate_dim in encode() to control embedding size

texts = [
    "My order hasn't arrived yet and it's been two weeks.",
    "How do I reset my password?",
    "I'd like a refund for my recent purchase.",
    "Your product exceeded my expectations. Great job!",
]

# Encode texts
embeddings = model.encode(texts)
print(embeddings.shape)
# (4, 1024)

similarity = model.similarity(embeddings, embeddings)
print(similarity)
# tensor([[1.0000, 0.7347, 0.7988, 0.7523],
#         [0.7347, 1.0000, 0.7440, 0.7228],
#         [0.7988, 0.7440, 1.0000, 0.7321],
#         [0.7523, 0.7228, 0.7321, 1.0000]])
via vLLM
from vllm import LLM
from vllm.config.pooler import PoolerConfig

# Initialize model
name = "jinaai/jina-embeddings-v5-text-small-classification"
model = LLM(
    model=name,
    dtype="float16",
    runner="pooling",
    pooler_config=PoolerConfig(seq_pooling_type="LAST", normalize=True)
)

# Create text prompts
document1 = "Overview of climate change impacts on coastal cities"
document1_prompt = f"Document: {document1}"

document2 = "The impacts of climate change on large cities"
document2_prompt = f"Document: {document2}"

# Encode all prompts
prompts = [document1_prompt, document2_prompt]
outputs = model.encode(prompts, pooling_task="embed")

embed_document1 = outputs[0].outputs.data
embed_document2 = outputs[1].outputs.data
via Text Embeddings Inference
  • Via Docker on CPU:
    docker run -p 8080:80 \
      ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 \
      --model-id jinaai/jina-embeddings-v5-text-small-classification \
      --dtype float32 --pooling last-token
    
  • Via Docker on NVIDIA GPU (Turing, Ampere, Ada Lovelace, Hopper or Blackwell):
    docker run --gpus all --shm-size 1g -p 8080:80 \
      ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 \
      --model-id jinaai/jina-embeddings-v5-text-small-classification \
      --dtype float16 --pooling last-token
    

Alternatively, you can also run with cargo, more information can be found in the Text Embeddings Inference documentation.

Send a request to /v1/embeddings to generate embeddings via the OpenAI Embeddings API:

curl -X POST http://127.0.0.1:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jinaai/jina-embeddings-v5-text-small-classification",
    "input": [
      "Document: The impacts of climate change on coastal cities are significant...",
    ]
  }'

Or rather via the Text Embeddings Inference API specification instead, to prevent from manually formatting the inputs:

curl -X POST http://127.0.0.1:8080/embed \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": "Overview of climate change impacts on coastal cities",
    "prompt_name": "document",
  }'
via llama.cpp (GGUF) After installing llama.cpp one can run llama-server to host the embedding model as OpenAI API compatible HTTP server with the respective model version:
llama-server -hf jinaai/jina-embeddings-v5-text-small-classification:F16 --embedding --pooling last -ub 32768

Client:

curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      "Document: A beautiful sunset over the beach",
      "Document: Un beau coucher de soleil sur la plage",
      "Document: 海滩上美丽的日落",
      "Document: 浜辺に沈む美しい夕日",
      "Document: Golden sunlight melts into the horizon, painting waves in warm amber and rose, while the sky whispers goodnight to the quiet, endless sea."
    ]
  }'
via Optimum (ONNX)

You can run the ONNX-optimized version of the model locally using Hugging Face's optimum library. Make sure you have the required dependencies installed (e.g., pip install optimum[onnxruntime] transformers torch):

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch

model_id = "jinaai/jina-embeddings-v5-text-small-classification"

# 1. Load tokenizer and ONNX model
# We specify the subfolder 'onnx' where the weights are located
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    subfolder="onnx",
    file_name="model.onnx",
    provider="CPUExecutionProvider",  # Or "CUDAExecutionProvider" for GPU
    trust_remote_code=True,
)

# 2. Prepare input
texts = ["Document: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")


# 4. Inference
with torch.no_grad():
    outputs = model(**inputs)

# 5. Pooling (Crucial for Jina-v5)
# Jina-v5 uses LAST-TOKEN pooling.
# We take the hidden state of the last non-padding token.
last_hidden_state = outputs.last_hidden_state
# Find the indices of the last token (usually the end of the sequence)
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]

print('embeddings shape:', embeddings.shape)
print('embeddings:', embeddings)

License

The model is licensed under CC BY-NC 4.0. For commercial use, please contact us.

Citation

If you find jina-embeddings-v5-text-small-classification useful in your research, please cite the following paper:

@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
      title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation}, 
      author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael Günther and Maximilian Werk and Han Xiao},
      year={2026},
      eprint={2602.15547},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.15547}, 
}
Downloads last month
11,103
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jinaai/jina-embeddings-v5-text-small-classification

Quantized
(8)
this model
Finetunes
1 model

Collection including jinaai/jina-embeddings-v5-text-small-classification

Paper for jinaai/jina-embeddings-v5-text-small-classification