--- license: apache-2.0 language: - en tags: - sentence-transformers - embeddings - retrieval - agents - memory - rag - semantic-search - ai-agents - llm-memory - vector-search library_name: transformers pipeline_tag: sentence-similarity datasets: - custom metrics: - mrr - recall - ndcg model-index: - name: agentrank-base results: - task: type: retrieval name: Agent Memory Retrieval metrics: - type: mrr value: 0.6496 name: MRR - type: recall value: 0.4440 name: Recall@1 - type: recall value: 0.9960 name: Recall@5 - type: ndcg value: 0.6786 name: NDCG@10 ---
# ๐Ÿง  AgentRank-Base ### The First Embedding Model Built Specifically for AI Agent Memory Retrieval

MRR Recall@5 Parameters License

**+23% MRR improvement over general-purpose embedders** | **Temporal awareness** | **Memory type understanding** [๐Ÿš€ Quick Start](#-quick-start) โ€ข [๐Ÿ“Š Benchmarks](#-benchmarks) โ€ข [๐Ÿ”ง Architecture](#-architecture) โ€ข [๐Ÿ’ก Why AgentRank?](#-why-agentrank)
--- ## ๐ŸŽฏ TL;DR > **AgentRank-Base** is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands: > - โฐ **When** something happened (temporal awareness) > - ๐Ÿ“ **What type** of memory it is (episodic vs semantic vs procedural) > - โญ **How important** the memory is --- ## ๐Ÿ’ก Why AgentRank? ### The Problem with Current Embedders AI agents need memory. But when you ask an agent: > *"What did we discuss about Python **yesterday**?"* Current embedders fail because they: - โŒ Don't understand "yesterday" means recent time - โŒ Can't distinguish between a preference and an event - โŒ Treat all memories as equally important ### The AgentRank Solution | Challenge | OpenAI/Cohere/MiniLM | AgentRank | |-----------|---------------------|-----------| | "What did I say **yesterday**?" | Random old results ๐Ÿ˜• | Recent memories first โœ… | | "What's my **preference**?" | Mixed with events ๐Ÿ˜• | Only preferences โœ… | | "What's **most important**?" | No priority ๐Ÿ˜• | Importance-aware retrieval โœ… | --- ## ๐Ÿ“Š Benchmarks Evaluated on **AgentMemBench** (500 test samples, 8 candidates each): | Model | Parameters | MRR โ†‘ | Recall@1 โ†‘ | Recall@5 โ†‘ | NDCG@10 โ†‘ | |-------|------------|-------|------------|------------|-----------| | **AgentRank-Base** | 149M | **0.6496** | **0.4440** | **0.9960** | **0.6786** | | AgentRank-Small | 33M | 0.6375 | 0.4460 | 0.9740 | 0.6797 | | all-mpnet-base-v2 | 109M | 0.5351 | 0.3660 | 0.7960 | 0.6335 | | all-MiniLM-L6-v2 | 22M | 0.5297 | 0.3720 | 0.7520 | 0.6370 | ### Improvement Over Baselines | vs Baseline | MRR | Recall@1 | Recall@5 | |-------------|-----|----------|----------| | vs MiniLM | **+22.6%** | **+19.4%** | **+32.4%** | | vs MPNet | **+21.4%** | **+21.3%** | **+25.1%** | --- ## ๐Ÿš€ Quick Start ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import AutoModel, AutoTokenizer import torch # Load model and tokenizer model = AutoModel.from_pretrained("vrushket/agentrank-base") tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base") def encode(texts, model, tokenizer): """Encode texts to embeddings.""" inputs = tokenizer( texts, padding=True, truncation=True, max_length=512, return_tensors="pt" ) with torch.no_grad(): outputs = model(**inputs) # Mean pooling embeddings = outputs.last_hidden_state.mean(dim=1) # L2 normalize embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1) return embeddings # Your agent's memories memories = [ "User prefers Python over JavaScript for backend development", "User asked about React frameworks yesterday", "User mentioned they have 3 years of coding experience", "User is working on an e-commerce project", ] # A query from the user query = "What programming language does the user prefer?" # Encode everything memory_embeddings = encode(memories, model, tokenizer) query_embedding = encode([query], model, tokenizer) # Find most similar memory similarities = torch.mm(query_embedding, memory_embeddings.T)[0] best_match_idx = similarities.argmax().item() print(f"Query: {query}") print(f"Best match: {memories[best_match_idx]}") print(f"Similarity: {similarities[best_match_idx]:.4f}") # Output: # Query: What programming language does the user prefer? # Best match: User prefers Python over JavaScript for backend development # Similarity: 0.8234 ``` ### Advanced Usage with Metadata For full temporal and memory type awareness, use the AgentRank package: ```python # Coming soon: pip install agentrank from agentrank import AgentRankEmbedder model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base") # Encode with temporal context memory_embedding = model.encode( text="User mentioned they prefer morning meetings", days_ago=7, # Memory is 1 week old memory_type="semantic" # It's a preference (not an event) ) # Encode query (no metadata needed for queries) query_embedding = model.encode("When does the user like to have meetings?") # The model now knows this is a week-old preference! similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0) ``` --- ## ๐Ÿ”ง Architecture AgentRank-Base is built on **ModernBERT-base** (110M params) with novel additions: ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ ModernBERT Encoder (22 Transformer Layers) โ”‚ โ”‚ - RoPE Positional Encoding โ”‚ โ”‚ - Flash Attention โ”‚ โ”‚ - 768 Hidden Dimension โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ†“ โ†“ โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Temporal โ”‚ โ”‚ Memory โ”‚ โ”‚ Importance โ”‚ โ”‚ Position โ”‚ โ”‚ Type โ”‚ โ”‚ Prediction โ”‚ โ”‚ Embeddings โ”‚ โ”‚ Embeddings โ”‚ โ”‚ Head โ”‚ โ”‚ (10 ร— 768) โ”‚ โ”‚ (4 ร— 768) โ”‚ โ”‚ (768โ†’1) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Projection Layer โ”‚ โ”‚ (768 โ†’ 768) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ L2 Normalization โ”‚ โ”‚ 768-dim Embedding โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### Novel Components | Component | Purpose | How It Helps | |-----------|---------|--------------| | **Temporal Embeddings** | Encodes memory age (today, this week, last month, etc.) | "Yesterday" queries match recent memories | | **Memory Type Embeddings** | Distinguishes episodic/semantic/procedural | "What do I like?" matches preferences, not events | | **Importance Head** | Auxiliary task predicting memory priority | Helps learn better representations | ### Temporal Buckets ``` Bucket 0: Today (0-1 days) Bucket 1: Recent (1-3 days) Bucket 2: This week (3-7 days) Bucket 3: Last week (7-14 days) Bucket 4: This month (14-30 days) Bucket 5: Last month (30-60 days) Bucket 6: Few months (60-90 days) Bucket 7: Half year (90-180 days) Bucket 8: This year (180-365 days) Bucket 9: Long ago (365+ days) ``` ### Memory Types ``` Type 0: Episodic โ†’ Events, conversations ("We discussed X yesterday") Type 1: Semantic โ†’ Facts, preferences ("User likes Python") Type 2: Procedural โ†’ Instructions ("To deploy, run npm build") Type 3: Unknown โ†’ Fallback ``` --- ## ๐ŸŽ“ Training Details | Aspect | Details | |--------|---------| | **Base Model** | answerdotai/ModernBERT-base (110M params) | | **Training Data** | 500K synthetic agent memory samples | | **Memory Distribution** | Episodic (40%), Semantic (35%), Procedural (25%) | | **Loss Function** | Multiple Negatives Ranking Loss + Importance MSE | | **Hard Negatives** | 7 per sample (5 types: temporal, type confusion, topic drift, etc.) | | **Batch Size** | 16-32 per GPU | | **Hardware** | 2ร— NVIDIA RTX 6000 Ada (48GB each) | | **Training Time** | ~12 hours | | **Precision** | FP16 Mixed Precision | | **Final Val Loss** | 0.877 | --- ## ๐Ÿ—๏ธ Use Cases ### 1. AI Agents with Long-Term Memory ```python # Store memories with metadata agent.remember( text="User is allergic to peanuts", memory_type="semantic", importance=10, # Critical medical info! ) # Later, when discussing food... relevant_memories = agent.recall("What should I know about the user's diet?") # Returns: "User is allergic to peanuts" (even if stored months ago) ``` ### 2. RAG Systems for Conversational AI ```python # Better retrieval for chatbots query = "What did we talk about in our last meeting?" # AgentRank returns recent, relevant conversations # Generic embedders return random topically-similar docs ``` ### 3. Personal Knowledge Bases ```python # User's notes and preferences memories = [ "I prefer dark mode in all apps", "My morning routine starts at 6 AM", "Important: Tax deadline April 15", ] # AgentRank properly handles time-sensitive queries ``` --- ## ๐Ÿ†š When to Use AgentRank vs Others | Use Case | Best Model | |----------|------------| | **AI agents with memory** | โœ… AgentRank | | **Time-sensitive retrieval** | โœ… AgentRank | | **Conversational AI** | โœ… AgentRank | | General document search | OpenAI / Cohere | | Code search | CodeBERT | | Scientific papers | SciBERT | --- ## ๐Ÿ“ Model Family | Model | Parameters | Speed | Quality | Best For | |-------|------------|-------|---------|----------| | [agentrank-small](https://huggingface.co/vrushket/agentrank-small) | 33M | โšกโšกโšก Fast | Good | Real-time agents, edge | | **agentrank-base** | 149M | โšกโšก Medium | **Best** | Quality-critical apps | | agentrank-reranker (coming) | 149M | โšก Slower | Superior | Two-stage retrieval | --- ## ๐Ÿ“š Citation ```bibtex @misc{agentrank2024, author = {Vrushket More}, title = {AgentRank: Embedding Models for AI Agent Memory Retrieval}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/vrushket/agentrank-base} } ``` --- ## ๐Ÿค Community & Support - ๐Ÿ› **Issues**: [GitHub Issues](https://github.com/vmore2/AgentRank-base/issues) - ๐Ÿ’ฌ **Discussions**: [HuggingFace Community](https://huggingface.co/vrushket/agentrank-base/discussions) - ๐Ÿ“ง **Contact**: vrushket2604@gmail.com --- ## ๐Ÿ“„ License Apache 2.0 - **Free for commercial use!** ---
### โญ If AgentRank helps your project, please star the repo! **Built with โค๏ธ for the AI agent community**