|
|
--- |
|
|
library_name: transformers |
|
|
license: cc-by-nc-sa-4.0 |
|
|
pipeline_tag: text-ranking |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
# Contextual AI Reranker v2 6B |
|
|
|
|
|
<img src="Contextual_AI_Brand_Mark_Dark.png" width="10%" alt="Contextual_AI"/> |
|
|
|
|
|
[](https://contextual.ai/blog/rerank-v2) |
|
|
[](https://huggingface.co/collections/ContextualAI/contextual-ai-reranker-v2) |
|
|
|
|
|
</div> |
|
|
|
|
|
<hr> |
|
|
|
|
|
## Highlights |
|
|
|
|
|
Contextual AI's reranker is the **first instruction-following reranker** capable of handling retrieval conflicts and ranking with custom instructions (e.g., prioritizing recent information). It achieves state-of-the-art performance on BEIR and sits on the cost/performance Pareto frontier across: |
|
|
|
|
|
- Instruction following |
|
|
- Question answering |
|
|
- Multilinguality (100+ languages) |
|
|
- Product search & recommendation |
|
|
- Real-world use cases |
|
|
|
|
|
<p align="center"> |
|
|
<img src="main_benchmark.png" width="1200"/> |
|
|
<p> |
|
|
|
|
|
For detailed benchmarks, see our [blog post](https://contextual.ai/blog/rerank-v2). |
|
|
|
|
|
## Overview |
|
|
|
|
|
- **Model Type**: Text Reranking |
|
|
- **Supported Languages**: 100+ |
|
|
- **Parameters**: 6B |
|
|
- **Context Length**: up to 32K |
|
|
|
|
|
## When to Use This Model |
|
|
|
|
|
Use this reranker when you need to: |
|
|
- Re-rank retrieved documents with custom instructions |
|
|
- Handle conflicting information in retrieval results |
|
|
- Prioritize documents by recency or other criteria |
|
|
- Support multilingual search (100+ languages) |
|
|
- Process long contexts (up to 32K tokens) |
|
|
|
|
|
## Quickstart |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
# Choose vLLM (recommended for production) or Transformers (simpler setup) |
|
|
# See full implementation in sections below |
|
|
|
|
|
model_path = "ContextualAI/reranker_v2_6b" |
|
|
|
|
|
query = "What are the health benefits of exercise?" |
|
|
instruction = "Prioritize recent medical research" |
|
|
documents = [ |
|
|
"Regular exercise reduces risk of heart disease and improves mental health.", |
|
|
"A 2024 study shows exercise enhances cognitive function in older adults.", |
|
|
"Ancient Greeks valued physical fitness for military training." |
|
|
] |
|
|
|
|
|
# Using vLLM (see full code below): |
|
|
infer_w_vllm(model_path, query, instruction, documents) |
|
|
|
|
|
# OR using Transformers (see full code below): |
|
|
infer_w_hf(model_path, query, instruction, documents) |
|
|
``` |
|
|
|
|
|
**Expected Output:** |
|
|
``` |
|
|
Query: What are the health benefits of exercise? |
|
|
Instruction: Prioritize recent medical research |
|
|
Score: -2.2969 | Doc: A 2024 study shows exercise enhances cognitive function in older adults. |
|
|
Score: -4.6875 | Doc: Regular exercise reduces risk of heart disease and improves mental health. |
|
|
Score: -12.3750 | Doc: Ancient Greeks valued physical fitness for military training. |
|
|
``` |
|
|
|
|
|
### vLLM Usage (Recommended for Production) |
|
|
|
|
|
Requires `vllm==0.10.0` for NVFP4 or `vllm>=0.8.5` for BF16. |
|
|
|
|
|
```python |
|
|
import os |
|
|
os.environ['VLLM_USE_V1'] = '0' # v1 engine doesn't support logits processor yet |
|
|
|
|
|
import torch |
|
|
from vllm import LLM, SamplingParams |
|
|
|
|
|
|
|
|
def logits_processor(_, scores): |
|
|
"""Custom logits processor for vLLM reranking.""" |
|
|
index = scores[0].view(torch.uint16) |
|
|
scores = torch.full_like(scores, float("-inf")) |
|
|
scores[index] = 1 |
|
|
return scores |
|
|
|
|
|
|
|
|
def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]: |
|
|
"""Format query and documents into prompts for reranking.""" |
|
|
if instruction: |
|
|
instruction = f" {instruction}" |
|
|
prompts = [] |
|
|
for doc in documents: |
|
|
prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??" |
|
|
prompts.append(prompt) |
|
|
return prompts |
|
|
|
|
|
|
|
|
def infer_w_vllm(model_path: str, query: str, instruction: str, documents: list[str]): |
|
|
model = LLM( |
|
|
model=model_path, |
|
|
gpu_memory_utilization=0.85, |
|
|
max_model_len=8192, |
|
|
dtype="bfloat16", |
|
|
max_logprobs=2, |
|
|
max_num_batched_tokens=262144, |
|
|
) |
|
|
sampling_params = SamplingParams( |
|
|
temperature=0, |
|
|
max_tokens=1, |
|
|
logits_processors=[logits_processor] |
|
|
) |
|
|
prompts = format_prompts(query, instruction, documents) |
|
|
|
|
|
outputs = model.generate(prompts, sampling_params, use_tqdm=False) |
|
|
|
|
|
# Extract scores and create results |
|
|
results = [] |
|
|
for i, output in enumerate(outputs): |
|
|
score = ( |
|
|
torch.tensor([output.outputs[0].token_ids[0]], dtype=torch.uint16) |
|
|
.view(torch.bfloat16) |
|
|
.item() |
|
|
) |
|
|
results.append((score, i, documents[i])) |
|
|
|
|
|
# Sort by score (descending) |
|
|
results = sorted(results, key=lambda x: x[0], reverse=True) |
|
|
|
|
|
print(f"Query: {query}") |
|
|
print(f"Instruction: {instruction}") |
|
|
for score, doc_id, doc in results: |
|
|
print(f"Score: {score:.4f} | Doc: {doc}") |
|
|
|
|
|
|
|
|
# Example usage |
|
|
if __name__ == "__main__": |
|
|
model_path = "ContextualAI/reranker_v2_6b" |
|
|
query = "What are the health benefits of exercise?" |
|
|
instruction = "Prioritize recent medical research" |
|
|
documents = [ |
|
|
"Regular exercise reduces risk of heart disease and improves mental health.", |
|
|
"A 2024 study shows exercise enhances cognitive function in older adults.", |
|
|
"Ancient Greeks valued physical fitness for military training." |
|
|
] |
|
|
|
|
|
infer_w_vllm(model_path, query, instruction, documents) |
|
|
``` |
|
|
|
|
|
|
|
|
### Transformers Usage (Simpler Setup) |
|
|
|
|
|
Requires `transformers>=4.51.0` for BF16. Not supported for NVFP4. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
|
|
|
def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]: |
|
|
"""Format query and documents into prompts for reranking.""" |
|
|
if instruction: |
|
|
instruction = f" {instruction}" |
|
|
prompts = [] |
|
|
for doc in documents: |
|
|
prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??" |
|
|
prompts.append(prompt) |
|
|
return prompts |
|
|
|
|
|
|
|
|
def infer_w_hf(model_path: str, query: str, instruction: str, documents: list[str]): |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32 |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True) |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
tokenizer.padding_side = "left" # so -1 is the real last token for all prompts |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype).to(device) |
|
|
model.eval() |
|
|
|
|
|
prompts = format_prompts(query, instruction, documents) |
|
|
enc = tokenizer( |
|
|
prompts, |
|
|
return_tensors="pt", |
|
|
padding=True, |
|
|
truncation=True, |
|
|
) |
|
|
input_ids = enc["input_ids"].to(device) |
|
|
attention_mask = enc["attention_mask"].to(device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
out = model(input_ids=input_ids, attention_mask=attention_mask) |
|
|
|
|
|
next_logits = out.logits[:, -1, :] # [batch, vocab] |
|
|
|
|
|
scores_bf16 = next_logits[:, 0].to(torch.bfloat16) |
|
|
scores = scores_bf16.float().tolist() |
|
|
|
|
|
# Sort by score (descending) |
|
|
results = sorted([(s, i, documents[i]) for i, s in enumerate(scores)], key=lambda x: x[0], reverse=True) |
|
|
|
|
|
print(f"Query: {query}") |
|
|
print(f"Instruction: {instruction}") |
|
|
for score, doc_id, doc in results: |
|
|
print(f"Score: {score:.4f} | Doc: {doc}") |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{ctxl_rerank_v2_instruct_multilingual, |
|
|
title={Contextual AI Reranker v2}, |
|
|
author={Halal, George and Agrawal, Sheshansh}, |
|
|
year={2025}, |
|
|
url={https://contextual.ai/blog/rerank-v2}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0) |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or issues, please open an issue on the model repository. |