AetherMind_SRL: Self-Reflective Learning for Robust Natural Language Inference

Community Article Published November 21, 2025

By: Sameer S. Najm
Project: AetherMind — Advanced Reasoning System
Model Repo: https://huggingface.co/samerzaher80/AetherMind_SRL
Dataset Repo: https://huggingface.co/datasets/samerzaher80/NLI_DataSets

Overview

AetherMind_SRL is a knowledge-distilled, self-reflective Transformer designed for robust, adversarial, and clinical Natural Language Inference (NLI).
It integrates:

Knowledge Distillation (KD) from DeBERTa-v3-base
Self-Reflective Learning (SRL) loops
ANLI adversarial fine-tuning
ADNI clinical reasoning (Alzheimer’s domain)
Smart Error Buffers and structured hard-example mining

This model is the Round 12 SRL-ANLI Smart checkpoint, achieving strong generalization across SNLI/MNLI, adversarial ANLI, and clinical Alzheimer’s reasoning.

AetherMind_SRL is part of the broader AetherMind project, a multi-year effort to build an adaptive reasoning engine with human-like error correction.

What Makes AetherMind_SRL Unique?

1. Knowledge Distillation Core

The student is a compact, efficient version of a DeBERTa-v3-base teacher.

2. Self-Reflective Learning (SRL) Engine

SRL is a supervision-driven self-improvement loop:

The model predicts on ANLI + ADNI
Logs all errors (and correct predictions)
Builds a structured error buffer
Retrains on corrected samples
Repeats this until stability

3. Smart Error Buffer (Round 12)

A carefully engineered dataset built from ANLI R1/R2/R3 + corrected SRL samples:

Label-balanced
Error-heavy (60%) + Anchor samples (40%)
Includes negation, multi-hop, lexical overlap challenges
Prevents catastrophic forgetting

4. Clinical-Grade Reasoning

The model is aligned with Alzheimer’s NLI tasks (MMSE claims), achieving perfect scores on ADNI Val/Test.

Full SRL Pipeline

Step 1: ANLI Global Error Mining

Model evaluated on:

ANLI R1 Dev
ANLI R2 Dev
ANLI R3 Dev

For each example, logs include:

premise, hypothesis
gold label
predicted label
logits + confidence
error flag
pattern category (if detected)
reason of failure (negation, overlap, etc.)

These logs are merged into:

global_error_buffer_anli_round12_train.csv
global_error_buffer_anli_round12_val.csv

Step 2: Error Pattern Classification

Pattern Category	Description	Frequency
long_premise_multi_hop	Multi-step logic across long sentences	~28%
negation_confusion	Misinterprets “not”, “never”, “no longer”	~22%
lexical_overlap_confusion	Wrongly assumes entailment due to word overlap	~18%
neutral_confusion_other	Needs subtle contextual/world knowledge	~15%
Other (temporal, numeric)	More advanced cognitive reasoning	~17%

These patterns define next-step training priorities.

Step 3: SMART Training Buffer Construction

SMART = Structured Misclassification-Aware Retraining Technique

Buffer Stats:

1787 training examples
448 validation examples
60% error samples
40% anchor samples
Balanced: 40% E / 30% N / 30% C
Sequence length: 192

Anchors stabilize behavior on SNLI/MNLI.
Errors provide adversarial pressure.

Step 4: Fine-Tuning Strategy (Round 12 Smart)

Epochs: 1
LR: 1e-6 to 2e-6
Optimizer: AdamW
Loss: Cross-Entropy + Class Weights
Weight Boost:
- Historically hard classes (Neutral, Contradiction)
- Error-flag samples have ×2–×3 weight

Base checkpoint:

student_biomed_kd_fast\adni_srl_round11_smart

Step 5: ADNI Clinical SRL Loop (Special Domain)

Pipeline:

Run on ADNI Cognitive NLI
Extract ADNI-specific errors
Boost memory-related reasoning:
- Temporal sequences
- Cognitive score changes
- Decline/stability patterns
Weighted CE:
- Correct = 1.0
- Errors = 3.0
Repeat 2–3 micro-rounds

Result: 100% accuracy on ADNI Val/Test without destabilizing SNLI/MNLI/ANLI.

Evaluation Results — Round 12 (Final)

General NLI

Dataset	Accuracy	Macro F1	Samples
SNLI	89.64%	89.55%	9824
MNLI-M	90.20%	90.00%	9815
MNLI-MM	89.61%	89.35%	9832
XNLI (en)	90.36%	90.32%	2490

Adversarial NLI (ANLI)

Dataset	Accuracy	Macro F1
ANLI R1	79.90%	79.89%
ANLI R2	67.50%	67.35%
ANLI R3	67.33%	66.81%

Clinical NLI (ADNI)

Split	Accuracy	Macro F1
Train	100%	100%
Val	100%	100%
Test	100%	100%

Round-4 → Round-5 SRL Improvements (From Notes)

Dataset	Acc⁴	Acc⁵	F1⁴	F1⁵
SNLI	90.1	92.4	90.0	92.3
MNLI-M	84.5	86.7	84.2	86.0
MNLI-MM	84.0	86.0	83.8	85.5
ANLI R1	62.0	65.0	61.5	64.0
ANLI R2	47.0	49.0	46.5	48.0
ANLI R3	45.0	47.0	44.0	46.0
XNLI	78.0	80.0	77.0	79.0
ADNI	83.0	85.0	82.0	84.0

Repository Contents

Included scripts:

build_anli_global_error_buffer_round1.py
analyze_anli_errors_round1.py
evaluate_model_hf_only.py
srl_finetune_round5_smart.py

These implement the SRL-ANLI training engine.

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "samerzaher80/AetherMind_SRL"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).cuda()

premise = "The patient scored 28 on the MMSE last year."
hypothesis = "The patient shows signs of cognitive decline."

inputs = tokenizer(premise, hypothesis, return_tensors="pt").to("cuda")
with torch.no_grad():
    logits = model(**inputs).logits
    prediction = torch.argmax(logits, dim=-1).item()

print(["entailment", "neutral", "contradiction"][prediction])

Intended Use

Research on self-reflective NLI
Adversarial reasoning (ANLI)
Clinical NLP (Alzheimer’s NLI)
Robust text understanding for downstream tasks

Limitations

English-only
ANLI still extremely challenging
Clinical generalization beyond ADNI not guaranteed

Acknowledgments

Thanks to:

Hugging Face
Open-source research community
ADNI dataset contributors
Supporters of the AetherMind project

Repos

Model: https://huggingface.co/samerzaher80/AetherMind_SRL
Dataset: https://huggingface.co/datasets/samerzaher80/NLI_DataSets

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote