GliZNet — DeBERTa-v3-base

GliZNet (Generalized Zero-Shot Network) is a zero-shot text classification model that processes the input text and all candidate labels jointly in a single forward pass, achieving O(1) inference complexity regardless of the number of labels.

Built on top of microsoft/deberta-v3-base, GliZNet encodes text and labels together in one sequence, extracts each label's representation from its [LAB] separator token, builds a label-specific text summary via cross-attention, and scores each pair with a bilinear head. A hybrid loss combining one-vs-negatives softmax (primary, with optional additive margin), auxiliary BCE, and a partial VICReg regularizer (variance + covariance terms only — the invariance term is dropped, leaving a pure label-repulsion objective) sharpens discrimination between semantically similar labels.

Paper: GliZNet: A Novel Architecture for Zero-Shot Text Classification
Alex Kameni (Ivalua / Massy, France)

Code: github.com/KameniAlexNea/zero-shot-classification
Synthetic data generation: github.com/KameniAlexNea/generate-gliznet-data


Model Details

Property Value
Backbone microsoft/deberta-v3-base (~184 M params)
Scoring head Bilinear (nn.Bilinear(D, D, 1))
Max sequence length 512 tokens
Label separator token [LAB]
Label representation [LAB] token hidden state
Training precision bfloat16
Model type ID gliznet

Architecture Summary

flowchart TD
    A["Input sequence\n[CLS] <text> [SEP] <label_1> [LAB] <label_2> [LAB] … [PAD]"]
    A --> B["DeBERTa-v3-base\nContextual hidden states"]
    B --> C["Label repr\n[LAB] token hidden state per label"]
    B --> D["Label-specific text repr\nCross-attention: label queries text tokens"]
    C --> E["Bilinear scoring head\nlogit = Bilinear(text_repr, label_repr)"]
    D --> E
    E --> F["One-vs-negatives loss (+ optional margin)\n+ auxiliary BCE\n+ optional label repulsion\n(training only)"]

Usage

With ZeroShotClassificationPipeline

import torch
from gliznet.model import GliZNetForSequenceClassification
from gliznet.tokenizer import GliZNETTokenizer
from gliznet.predictor import ZeroShotClassificationPipeline

model = GliZNetForSequenceClassification.from_pretrained("alexneakameni/gliznet-deberta-v3-base")
model = model.to(torch.bfloat16)
tokenizer = GliZNETTokenizer.from_pretrained("alexneakameni/gliznet-deberta-v3-base")

pipeline = ZeroShotClassificationPipeline(
    model, tokenizer,
    classification_type="multi-label",   # or "multi-class"
    device="cuda",
)

text = "Scientists discover a new exoplanet orbiting a distant star."
labels = ["astronomy", "politics", "cooking", "space exploration", "finance"]

result = pipeline(text, labels)
for ls in sorted(result.labels, key=lambda x: -x.score):
    print(f"  {ls.label:<25} {ls.score:.3f}")

With from_pretrained + HuggingFace AutoModel

from transformers import AutoModel, AutoConfig
import gliznet  # triggers Auto* registration

config = AutoConfig.from_pretrained("alexneakameni/gliznet-deberta-v3-base")
model  = AutoModel.from_pretrained("alexneakameni/gliznet-deberta-v3-base")

Performance

Evaluated on the GLiClass benchmark — 10 standard text-classification datasets reported as macro F1.
GLiClass variants are the closest published competitors; all encode text and labels jointly in a single forward pass.

GliZNet Performance

Macro F1 on GLiClass benchmark datasets

Dataset GliZNet (ours) GLiClass-large-v3 GLiClass-base-v3 GLiClass-modern-large GLiClass-modern-base GLiClass-edge
CR 0.8752 0.9398 0.9127 0.8952 0.8902 0.8215
SST-2 0.8839 0.9192 0.8959 0.9330 0.8959 0.8199
SST-5 0.3969 0.4606 0.3376 0.4619 0.2756 0.2823
IMDb 0.8855 0.9366 0.9251 0.9402 0.9158 0.8485
20-Newsgroups 0.3877 0.5958 0.4759 0.3905 0.3433 0.2217
Enron Spam 0.5164 0.7584 0.6760 0.5813 0.6398 0.5623
Financial PhraseBank 0.5945 0.9000 0.8971 0.5929 0.4200 0.5004
AG News 0.7332 0.7181 0.7279 0.7269 0.6663 0.6645
Emotion 0.3991 0.4506 0.4447 0.4517 0.4254 0.3851
Rotten Tomatoes 0.8009 0.8411 0.7943 0.7664 0.7070 0.7024
AVERAGE 0.6473 0.7520 0.7087 0.6740 0.6179 0.5809

Δ GliZNet vs GLiClass-large: −0.1047 · Δ vs GLiClass-base: −0.0614 · Δ vs GLiClass-modern-large: −0.0267 · Δ vs GLiClass-modern-base: +0.0294 · Δ vs GLiClass-edge: +0.0664

GliZNet is a DeBERTa-v3-base model trained on synthetic data only; GLiClass-large uses a significantly bigger backbone.


Training Details

Setting Value
Dataset alexneakameni/ZSHOT-HARDSET-v2
Train / Val / Test split 54,289 / 1,188 / 1,322
Optimizer AdamW
Learning rate 1e-4 (cosine schedule, 5% warmup)
Weight decay 1e-3
Batch size 16 × 2 GPUs × 4 grad. accum. = 128 effective
Epochs 10 (early stopping, patience=3)
Precision bf16
Distributed training DeepSpeed ZeRO-2 via accelerate launch
Hardware 2 × NVIDIA GPU
Loss One-vs-negatives softmax (weight 0.5, margin 0.5) + auxiliary BCE (weight 0.5) + partial VICReg label repulsion — variance & covariance only, no invariance term (weight 0.05)
Max labels per sample 20

Tokenizer

GliZNETTokenizer wraps the DeBERTa-v3 sentencepiece tokenizer and adds a custom [LAB] separator token. The input format is:

[CLS] <text> [SEP] <label_1> [LAB] <label_2> [LAB] ... <label_n> [LAB] [PAD]*

The lmask tensor (label mask) assigns 0 to text tokens and unique integers 1…n to each label's tokens, allowing the model to pool each label independently.


Limitations

  • Trained on synthetic English data; performance on specialized domains (legal, medical) or non-English text may degrade.
  • Best results when all candidate labels fit within 1024 tokens. Very large label sets should be batched.
  • The model does not produce calibrated probabilities; scores are cosine similarities scaled by a learned temperature.

Citation

@article{kameni2025gliznet,
  title   = {GliZNet: A Novel Architecture for Zero-Shot Text Classification},
  author  = {Alex Kameni},
  year    = {2025},
  note    = {Preprint. Code: https://github.com/KameniAlexNea/zero-shot-classification}
}

License

Apache 2.0

Downloads last month
121
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alexneakameni/gliznet-deberta-v3-base

Finetuned
(601)
this model

Dataset used to train alexneakameni/gliznet-deberta-v3-base