ALIA-40B Distill Vapol

apol/alia-40b-distill-vapol is a post-trained release derived from BSC-LT/ALIA-40b-instruct-2601, optimized for practical multilingual assistant behavior, structured output reliability, tool-call formatting, RAG-style answers, and coding/debugging tasks.

Interactive demo Space: apol/alia-40b-distill-vapol-demo

Detailed technical article in Spanish: BLOG.md

Deliverables

This repo contains:

Artifact Location Use
Q4_K_M GGUF gguf_chunks/ALIA-40b-distill-vapol-Q4_K_M.gguf.part-* Transport chunks for reconstructing the single-file llama.cpp / LM Studio deployment.
PEFT adapter adapter/ Highest-fidelity Hub artifact; load on top of BSC-LT/ALIA-40b-instruct-2601 for adapter-based inference or further research.
Runtime helper runtime/repair_eval_responses.py Optional deterministic repair layer for strict JSON/tool/RAG/code contracts.
Evaluation reports reports/ Local task metrics, hidden-suite validation outputs, and distillation summaries.

Intended Use

The model is intended for general assistant use, with emphasis on:

  • Spanish assistant tasks.
  • Catalan, Basque, and Galician instruction following.
  • Structured JSON output.
  • Tool-call formatting and missing-argument clarification.
  • Administrative and legal-style summarization.
  • Coding/debugging assistance.
  • Source-grounded long-context and RAG-style synthesis.

Loading

PEFT Adapter

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "BSC-LT/ALIA-40b-instruct-2601"
repo = "apol/alia-40b-distill-vapol"

tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="adapter")
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, repo, subfolder="adapter")

llama.cpp / LM Studio

The Q4_K_M GGUF is published as transport chunks for reliable Hub distribution. Reassemble it locally before loading:

cat gguf_chunks/ALIA-40b-distill-vapol-Q4_K_M.gguf.part-* > ALIA-40b-distill-vapol-Q4_K_M.gguf
sha256sum ALIA-40b-distill-vapol-Q4_K_M.gguf

Expected SHA256:

45f75478c721cf26617dc10f89bbfc663f5946a3779ddd19982bb7787790d285

Then load the reassembled file:

llama-cli \
  -m ALIA-40b-distill-vapol-Q4_K_M.gguf \
  -c 4096 \
  -ngl 99 \
  --temp 0.2 \
  -p "<prompt>"

What Was Improved

The work focused on competence and performance on practical assistant tasks rather than broad memorization. The main interventions were:

Lever What was applied Research influence
Targeted QLoRA SFT Efficient LoRA/QLoRA post-training on high-value assistant behaviors. QLoRA and HF FSDP/QLoRA practice.
Hard-example active distillation Data came from actual model failures: invalid JSON, missing tool fields, citation mistakes, weak multilingual responses, and incomplete code fixes. DeepSeek-style staged post-training and rejection-sampling distillation.
DPO preference alignment Chosen/rejected pairs contrasted corrected outputs against current-model failure patterns. DPO, SimPO/ORPO-style preference optimization ideas.
Verifier-first gates Deterministic validators controlled JSON validity, tool-call shape, citations, and task constraints before promotion. RLVR/GRPO-style emphasis on verifiable rewards and automatic gates.
Tool/RAG task shaping Training examples used realistic tool contracts, missing arguments, citation requirements, and multilingual source-grounded answers. DeepSeek V4, Kimi agentic training reports, HF Cookbook, and Smol Training Playbook.

These references informed design choices. This release does not claim to reproduce frontier-scale RL or agentic training.

Local Evaluation

The following local suites are deterministic assistant-task evaluations. They measure structured output, tool-call behavior, source-grounded answers, code fixes, and language constraints. They are not a substitute for a full academic benchmark campaign.

Model / Artifact Visible assistant eval Hidden verifier-first suite Hidden competence suite Notes
BSC-LT/ALIA-40b base not directly comparable not applicable not applicable Raw completion model; not instruction aligned.
BSC-LT/ALIA-40b-instruct-2601 21/80 rows, 386/519 checks baseline not included baseline not included Original instruction model under local validator style.
Distill Vapol adapter 33/80 rows, 446/519 checks 16/20 rows, 111/115 checks 11/20 rows, 100/115 checks Best model-only result.
Distill Vapol with deterministic runtime repair 41/80 rows, 458/519 checks 20/20 rows, 115/115 checks 20/20 rows, 115/115 checks Best practical deployment path when strict validators are available.
Distill Vapol Q4_K_M GGUF portable artifact integrity verified integrity verified Quantized release for LM Studio/llama.cpp; published as chunks and reassembled into one file locally. The PEFT adapter is the canonical highest-fidelity artifact.

Relative local improvement over the original ALIA instruct model on the visible assistant eval:

  • Row pass rate: 21/80 -> 33/80, a +57.1% relative increase.
  • Check pass rate: 386/519 -> 446/519, a +15.5% relative increase.
  • With deterministic runtime repair: 21/80 -> 41/80 rows, a +95.2% relative increase.

Official Reference Scores

The official BSC model cards report broad benchmark numbers for the source models. These are reference points, not direct comparisons to the local task evals above.

Sources:

Selected official BSC-LT/ALIA-40b-instruct-2601 reference scores:

Area Benchmark Official score
English knowledge MMLU 0.45
English reasoning ARC Challenge 0.40
English reasoning ARC Easy 0.73
English reading Belebele English 0.77
English commonsense HellaSwag acc 0.54
Spanish knowledge MMMLU Spanish 0.41
Spanish reading Belebele Spanish 0.72
Catalan reading Belebele Catalan 0.71
Basque reading Belebele Basque 0.67
Galician reading Belebele Galician 0.73

Estimated academic benchmark movement should be treated conservatively. The post-training targeted assistant reliability, formats, tool/RAG behavior, and multilingual task compliance; it should not be expected to dramatically change broad pretrained knowledge benchmarks such as MMLU.

Notes

  • The adapter is the highest-fidelity Hub artifact.
  • The Q4_K_M GGUF is the recommended portable local artifact; the Hub copy is chunked for reliable transport and reconstructs to one GGUF file.
  • The optional runtime repair helper is not embedded in the GGUF; it is a deployment-side deterministic layer for strict formal outputs.
  • For practical GGUF inference, use LM Studio or a CUDA-enabled llama.cpp build with GPU offload.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for apol/alia-40b-distill-vapol

Base model

BSC-LT/ALIA-40b
Adapter
(1)
this model

Space using apol/alia-40b-distill-vapol 1

Collection including apol/alia-40b-distill-vapol

Paper for apol/alia-40b-distill-vapol