vit-spoof-detection-pda / README.md

ArchitRastogi

added YAML data

901bbf6 verified 8 days ago

preview code

raw

history blame contribute delete

6.6 kB

metadata

language:
  - en
library_name: timm
tags:
  - vision
  - image-classification
  - vit
  - anti-spoofing
  - face-liveness
  - celeba-spoof
  - deep-learning
  - pytorch
  - huggingface
datasets:
  - celeba-spoof
model_name: ViT-Base-Patch16-224 Face Anti-Spoofing (CelebA Spoof PDA)
license: mit
tasks:
  - name: Face Anti-Spoofing
    type: image-classification
inference: true
metrics:
  - accuracy
  - f1
  - auc
  - precision
  - recall
  - specificity
  - far
  - frr
  - eer
model-index:
  - name: ViT-Base-Patch16-224 Anti-Spoofing (CelebA Spoof PDA)
    results:
      - task:
          type: image-classification
          name: Face Anti-Spoofing
        dataset:
          name: CelebA Spoof (PDA Splits 19–21)
          type: celeba-spoof
          split: test
          size: 1747
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.8329
          - name: F1-score
            type: f1
            value: 0.878
          - name: AUC-ROC
            type: auc
            value: 0.9561
          - name: Precision (PPV)
            type: precision
            value: 0.7974
          - name: Recall (TPR)
            type: recall
            value: 0.9768
          - name: Specificity
            type: specificity
            value: 0.6021
          - name: FAR (False Acceptance Rate)
            type: far
            value: 0.3979
          - name: FRR (False Rejection Rate)
            type: frr
            value: 0.0232
          - name: EER (Equal Error Rate)
            type: eer
            value: 0.1083

Vision Transformer for Face Anti-Spoofing (CelebA Spoof PDA)

This repository contains a fine-tuned Vision Transformer (ViT-Base-Patch16-224) model for face anti-spoofing on the CelebA Spoof (PDA) dataset.
The model was trained on the first 18 splits of the dataset and evaluated on splits 19–21, following the standard CelebA Spoof partitioning strategy.

Overview

The objective of this project is to develop a robust deep learning–based system capable of distinguishing live from spoofed faces in real-world conditions.
The model leverages the ViT architecture fine-tuned on GPU-augmented CelebA Spoof data with advanced training techniques, including:

Focal Loss for class imbalance
Threshold optimization
Weighted regularization
Early stopping
Hyperparameter tuning (via W&B sweeps)

Dataset

Dataset: CelebA Spoof (PDA)

Training splits: 1–18
Testing splits: 19–21
Classes: Binary classification (Live vs Spoof)
Total test samples: 1,747
- Live: 1,076
- Spoof: 671

Data Augmentation Pipeline

The augmentation process was GPU-accelerated using Kornia and executed on an NVIDIA RTX A5000 (32 vCPU).
Augmentation was designed to improve model generalization across lighting, pose, and spoof mediums.

Augmentation strategy:

Class	Augmentations per image	Techniques
Live	8×	Random flip, rotation, color jitter, Gaussian blur/noise, perspective, elastic transform, sharpness adjustment
Spoof	2×	Same set, applied with lower probability

Core augmentation methods:

Heavy, medium, and light pipelines (with variable transform intensity)
GPU-based batch processing with Kornia
Normalization aligned with ViT preprocessing (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

The complete augmentation logic is implemented in augument_data.py:contentReference[oaicite:0]{index=0}.

Model Architecture

The base model is a ViT-Base-Patch16-224, initialized with pretrained ImageNet weights and fine-tuned for binary classification.
A custom classification head was added:

LayerNorm(embed_dim) → Dropout(0.1) → Linear(512) → GELU → Dropout(0.1) → Linear(2)

Model configuration:

Patch size: 16
Dropout: 0.1
Optimizer: AdamW
Scheduler: Cosine Annealing with warm-up
Batch size: 128
Mixed precision: Enabled (AMP)
Early stopping and F1-based checkpointing

The full training procedure is implemented in train_advanced.py.

Training Details

Parameter	Value
Dataset	Augmented CelebA Spoof (Splits 1–18)
Optimizer	AdamW
Learning Rate	3e-4 (swept)
Weight Decay	0.05
Batch Size	128
Epochs	50
Loss	Focal Loss (α=0.25, γ=2.0)
Early Stopping	Patience = 10, Δ = 0.001
Threshold Optimization	Enabled
Scheduler	CosineAnnealingLR
Mixed Precision	True
Device	NVIDIA RTX A5000

Training and validation metrics were tracked using Weights & Biases for all runs.

Testing Procedure

Testing was conducted on splits 19–21, following the CelebA Spoof PDA protocol. The testing pipeline (test.py) evaluates the model on per-image and per-subject levels, generating:

Accuracy, F1, AUC
Precision, Recall, Specificity, NPV
FAR, FRR, and EER
Confusion Matrix
ROC Curve

Results and plots are automatically exported to disk during testing.

Results

Overall Performance

Metric	Score
Accuracy	83.29%
AUC-ROC	0.9561
F1-Score	0.8780

Detection Metrics

Metric	Value
Precision (PPV)	0.7974
Recall (TPR)	0.9768
Specificity	0.6021
NPV	0.9417

Error Rates

Metric	Value
False Acceptance Rate (FAR)	0.3979
False Rejection Rate (FRR)	0.0232
Equal Error Rate (EER)	0.1083

Confusion Matrix

	Predicted Spoof	Predicted Live
Actual Spoof	404	267
Actual Live	25	1051