AetherMind_SRL: Self-Reflective Learning for Robust Natural Language Inference
By: Sameer S. Najm
Project: AetherMind — Advanced Reasoning System
Model Repo: https://huggingface.co/samerzaher80/AetherMind_SRL
Dataset Repo: https://huggingface.co/datasets/samerzaher80/NLI_DataSets
Overview
AetherMind_SRL is a knowledge-distilled, self-reflective Transformer designed for robust, adversarial, and clinical Natural Language Inference (NLI).
It integrates:
- Knowledge Distillation (KD) from DeBERTa-v3-base
- Self-Reflective Learning (SRL) loops
- ANLI adversarial fine-tuning
- ADNI clinical reasoning (Alzheimer’s domain)
- Smart Error Buffers and structured hard-example mining
This model is the Round 12 SRL-ANLI Smart checkpoint, achieving strong generalization across SNLI/MNLI, adversarial ANLI, and clinical Alzheimer’s reasoning.
AetherMind_SRL is part of the broader AetherMind project, a multi-year effort to build an adaptive reasoning engine with human-like error correction.
What Makes AetherMind_SRL Unique?
1. Knowledge Distillation Core
The student is a compact, efficient version of a DeBERTa-v3-base teacher.
2. Self-Reflective Learning (SRL) Engine
SRL is a supervision-driven self-improvement loop:
- The model predicts on ANLI + ADNI
- Logs all errors (and correct predictions)
- Builds a structured error buffer
- Retrains on corrected samples
- Repeats this until stability
3. Smart Error Buffer (Round 12)
A carefully engineered dataset built from ANLI R1/R2/R3 + corrected SRL samples:
- Label-balanced
- Error-heavy (60%) + Anchor samples (40%)
- Includes negation, multi-hop, lexical overlap challenges
- Prevents catastrophic forgetting
4. Clinical-Grade Reasoning
The model is aligned with Alzheimer’s NLI tasks (MMSE claims), achieving perfect scores on ADNI Val/Test.
Full SRL Pipeline
Step 1: ANLI Global Error Mining
Model evaluated on:
- ANLI R1 Dev
- ANLI R2 Dev
- ANLI R3 Dev
For each example, logs include:
- premise, hypothesis
- gold label
- predicted label
- logits + confidence
- error flag
- pattern category (if detected)
- reason of failure (negation, overlap, etc.)
These logs are merged into:
global_error_buffer_anli_round12_train.csv
global_error_buffer_anli_round12_val.csv
Step 2: Error Pattern Classification
| Pattern Category | Description | Frequency |
|---|---|---|
| long_premise_multi_hop | Multi-step logic across long sentences | ~28% |
| negation_confusion | Misinterprets “not”, “never”, “no longer” | ~22% |
| lexical_overlap_confusion | Wrongly assumes entailment due to word overlap | ~18% |
| neutral_confusion_other | Needs subtle contextual/world knowledge | ~15% |
| Other (temporal, numeric) | More advanced cognitive reasoning | ~17% |
These patterns define next-step training priorities.
Step 3: SMART Training Buffer Construction
SMART = Structured Misclassification-Aware Retraining Technique
Buffer Stats:
- 1787 training examples
- 448 validation examples
- 60% error samples
- 40% anchor samples
- Balanced: 40% E / 30% N / 30% C
- Sequence length: 192
Anchors stabilize behavior on SNLI/MNLI.
Errors provide adversarial pressure.
Step 4: Fine-Tuning Strategy (Round 12 Smart)
- Epochs: 1
- LR: 1e-6 to 2e-6
- Optimizer: AdamW
- Loss: Cross-Entropy + Class Weights
- Weight Boost:
- Historically hard classes (Neutral, Contradiction)
- Error-flag samples have ×2–×3 weight
Base checkpoint:
student_biomed_kd_fast\adni_srl_round11_smart
Step 5: ADNI Clinical SRL Loop (Special Domain)
Pipeline:
- Run on ADNI Cognitive NLI
- Extract ADNI-specific errors
- Boost memory-related reasoning:
- Temporal sequences
- Cognitive score changes
- Decline/stability patterns
- Weighted CE:
- Correct = 1.0
- Errors = 3.0
- Repeat 2–3 micro-rounds
Result: 100% accuracy on ADNI Val/Test without destabilizing SNLI/MNLI/ANLI.
Evaluation Results — Round 12 (Final)
General NLI
| Dataset | Accuracy | Macro F1 | Samples |
|---|---|---|---|
| SNLI | 89.64% | 89.55% | 9824 |
| MNLI-M | 90.20% | 90.00% | 9815 |
| MNLI-MM | 89.61% | 89.35% | 9832 |
| XNLI (en) | 90.36% | 90.32% | 2490 |
Adversarial NLI (ANLI)
| Dataset | Accuracy | Macro F1 |
|---|---|---|
| ANLI R1 | 79.90% | 79.89% |
| ANLI R2 | 67.50% | 67.35% |
| ANLI R3 | 67.33% | 66.81% |
Clinical NLI (ADNI)
| Split | Accuracy | Macro F1 |
|---|---|---|
| Train | 100% | 100% |
| Val | 100% | 100% |
| Test | 100% | 100% |
Round-4 → Round-5 SRL Improvements (From Notes)
| Dataset | Acc⁴ | Acc⁵ | F1⁴ | F1⁵ |
|---|---|---|---|---|
| SNLI | 90.1 | 92.4 | 90.0 | 92.3 |
| MNLI-M | 84.5 | 86.7 | 84.2 | 86.0 |
| MNLI-MM | 84.0 | 86.0 | 83.8 | 85.5 |
| ANLI R1 | 62.0 | 65.0 | 61.5 | 64.0 |
| ANLI R2 | 47.0 | 49.0 | 46.5 | 48.0 |
| ANLI R3 | 45.0 | 47.0 | 44.0 | 46.0 |
| XNLI | 78.0 | 80.0 | 77.0 | 79.0 |
| ADNI | 83.0 | 85.0 | 82.0 | 84.0 |
Repository Contents
Included scripts:
build_anli_global_error_buffer_round1.py
analyze_anli_errors_round1.py
evaluate_model_hf_only.py
srl_finetune_round5_smart.py
These implement the SRL-ANLI training engine.
Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "samerzaher80/AetherMind_SRL"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).cuda()
premise = "The patient scored 28 on the MMSE last year."
hypothesis = "The patient shows signs of cognitive decline."
inputs = tokenizer(premise, hypothesis, return_tensors="pt").to("cuda")
with torch.no_grad():
logits = model(**inputs).logits
prediction = torch.argmax(logits, dim=-1).item()
print(["entailment", "neutral", "contradiction"][prediction])
Intended Use
- Research on self-reflective NLI
- Adversarial reasoning (ANLI)
- Clinical NLP (Alzheimer’s NLI)
- Robust text understanding for downstream tasks
Limitations
- English-only
- ANLI still extremely challenging
- Clinical generalization beyond ADNI not guaranteed
Acknowledgments
Thanks to:
- Hugging Face
- Open-source research community
- ADNI dataset contributors
- Supporters of the AetherMind project
Repos
Model: https://huggingface.co/samerzaher80/AetherMind_SRL
Dataset: https://huggingface.co/datasets/samerzaher80/NLI_DataSets