DeAR-8B-Reranker-RankNet-v1

Model Description

DeAR-8B-Reranker-RankNet-v1 is an 8B parameter neural reranker trained with RankNet loss and knowledge distillation. This model is part of the DeAR framework family and achieves strong performance on standard IR benchmarks while being significantly faster than larger teacher models.

Model Details

Model Type: Pointwise Reranker (Sequence Classification)
Base Model: LLaMA-3.1-8B
Parameters: 8 billion
Training Method: Knowledge Distillation + RankNet Loss
Teacher Model: LLaMA2-13B-RankLLaMA
Training Data: MS MARCO
Precision: BFloat16

Key Features

✅ High Performance: Competitive with 13B teacher on BEIR benchmarks
✅ Fast Inference: 2.2s average latency on standard GPU
✅ Memory Efficient: Fits on single 24GB GPU
✅ Knowledge Distillation: Enhanced with Chain-of-Thought reasoning

Performance

Benchmark	NDCG@10
TREC DL19	74.5
TREC DL20	72.8
BEIR (Avg)	45.2
MS MARCO Dev	68.9

Usage

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_path = "abdoelsayed/dear-8b-reranker-ranknet-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16
)
model.eval().cuda()

# Score a query-document pair
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,  # q_max_len(32) + p_max_len(196)
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")

Batch Reranking

def rerank_documents(query, documents, model, tokenizer, batch_size=64):
    """
    Rerank a list of documents for a query.
    
    Args:
        query: Search query string
        documents: List of (title, text) tuples
        model: Loaded reranker model
        tokenizer: Loaded tokenizer
        batch_size: Batch size for inference
    
    Returns:
        List of (index, score) tuples sorted by relevance
    """
    scores = []
    
    for i in range(0, len(documents), batch_size):
        batch_docs = documents[i:i + batch_size]
        
        # Prepare inputs
        queries = [f"query: {query}"] * len(batch_docs)
        docs = [f"document: {title} {text}" for title, text in batch_docs]
        
        inputs = tokenizer(
            queries,
            docs,
            return_tensors="pt",
            truncation=True,
            max_length=228,
            padding=True
        )
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
        
        # Get scores
        with torch.no_grad():
            logits = model(**inputs).logits.squeeze(-1)
            scores.extend(logits.cpu().tolist())
    
    # Sort by score (descending)
    ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
    return ranked


# Example usage
query = "When was the Eiffel Tower built?"
documents = [
    ("Eiffel Tower", "The Eiffel Tower was built in 1889 for the World's Fair."),
    ("Paris", "Paris is the capital of France."),
    ("Architecture", "Modern architecture has evolved significantly."),
]

ranking = rerank_documents(query, documents, model, tokenizer)
print(ranking)
# Output: [(0, 8.23), (1, 2.45), (2, -1.87)]

Training Details

Training Data

Primary Dataset: MS MARCO Passage Ranking

Hardware

GPUs: 4x NVIDIA A100 (40GB)
Training Time: ~36 hours
DeepSpeed: ZeRO Stage 2

Loss Function

RankNet Loss with Knowledge Distillation:

L_total = (1 - α) * L_RankNet + α * L_KD

where:
- L_RankNet: Pairwise ranking loss
- L_KD: KL divergence with teacher (temperature=2)
- α: 0.1 (distillation weight)

Evaluation Results

TREC Deep Learning

Dataset	NDCG@10	NDCG@20	MAP
DL19	74.50	70.23	45.67
DL20	72.80	69.15	43.21

BEIR Benchmark

Dataset	NDCG@10
MS MARCO	68.9
NQ	52.3
HotpotQA	61.8
FiQA	47.2
ArguAna	59.4
SciFact	73.6
TREC-COVID	85.2
NFCorpus	39.8

Efficiency

Metric	Value
Inference Time (100 docs)	2.2s
GPU Memory (inference)	18GB
Throughput	~45 docs/sec

Comparison with Other Models

Model	Size	TREC DL19	BEIR Avg	Inference (s)
MonoT5-3B	3B	71.8	43.5	3.5
DeAR-P-8B-RL	8B	74.5	45.2	2.2
Teacher (13B)	13B	73.8	44.8	5.8

Model Architecture

Input: "query: [Q] [SEP] document: [D]"
    ↓
LLaMA-3.1-8B Encoder
    ↓
[CLS] Token Representation
    ↓
Linear Classification Head
    ↓
Relevance Score (scalar)

Limitations

Domain Adaptation: Trained primarily on MS MARCO; may require fine-tuning for specialized domains
Query Length: Optimized for queries up to 32 tokens
Document Length: Truncated to 196 tokens; longer documents lose information
Language: English only
Numerical Reasoning: Limited capability for queries requiring calculations

Bias and Fairness

This model inherits biases present in:

Base LLaMA-3.1-8B model
MS MARCO training data
Teacher model annotations

Users should evaluate fairness for their specific use cases.

Ethical Considerations

Search Ranking: Can influence information access and visibility
Training Data: May contain biased or sensitive content
Misuse Potential: Should not be used for surveillance or discriminatory ranking

Related Models

DeAR Family:

DeAR-8B-CE - Binary Cross-Entropy variant
DeAR-8B-Listwise - Listwise reranking
DeAR-8B-RankNet-LoRA - LoRA adapter

Teacher:

LLaMA2-13B-RankLLaMA-Teacher

Dataset:

DeAR-COT

Citation

@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}