--- license: apple-amlr base_model: - mistralai/Mistral-7B-Instruct-v0.2 tags: - rag - compression - retrieval - generation --- # CLaRa-7B-Base (Compression-16 & 128) The CLaRa-7B-Base model is our foundational unified RAG model with built-in semantic document compression (16× and 128x). It provides a base compressor + generator capable of producing answers directly from compressed document representations. **Training recipe:** Trained using QA-guided semantic compression and paraphrase consistency objectives. **Benchmarks:** Strong baseline performance across multi-hop QA tasks under a 16× compression ratio. --- ## More details and usage examples: Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) GitHub: https://github.com/apple/ml-clara --- ## Example Usage ```python from transformers import AutoModel unirag = AutoModel.from_pretrained( "/mnt/ceph_rbd/model/CLaRa-7B-Base/compression-16", trust_remote_code=True ).to("cuda") documents = [ [ "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...", "Hagsatera is a genus of orchids native to Mexico and Guatemala...", "Alsobia is a genus of flowering plants native to Mexico and Central America..." ] ] questions = [""] out = unirag.generate_from_paraphrase( questions=questions, documents=documents, max_new_tokens=64 ) print("Generated answer:", out)