---
tags:
- image-classification
- timm
- dinov2
library_name: timm
license: apache-2.0
---

# vit_base_patch14_dinov2_sp_ft_in1k

DINOv2 Fine-tuned on ImageNet-1k

This model is a fine-tuned version of `vit_base_patch14_dinov2` on a subset of the ImageNet-1k dataset. 
The classification head has been trained for ImageNet-1k class classification (1000 classes).

## Model Details
- **Base Model**: DINOv2 ViT-Base/14
- **Fine-tuned on**: ImageNet-1k (1000 classes)
- **Parameters**: ~86M
- **Input Size**: 224x224

## Usage

```python
import torch
import timm
from PIL import Image
from huggingface_hub import hf_hub_download

# 1. Create Base Model
# Using the base model ID compatible with timm
model_id = "vit_base_patch14_dinov2" 
model = timm.create_model(model_id, pretrained=True) # Load pretrained backbone

# 2. Modify Head for 1000 classes
# DINOv2 usually has 768 dim for Base. 
model.head = torch.nn.Linear(768, 1000, bias=True)

# 3. Load Fine-tuned Weights
# Download the checkpoint from this repo
checkpoint_path = hf_hub_download(repo_id="SasikaA073/vit_base_patch14_dinov2_sp_ft_in1k", filename="vit_base_patch14_dinov2_sp_ft_in1k.pth")
state_dict = torch.load(checkpoint_path, map_location='cpu')

# Load state dict
# Note: The model was saved as state_dict only
model.load_state_dict(state_dict)
model.eval()

# 4. Inference
data_config = timm.data.resolve_data_config(model.pretrained_cfg)
transforms = timm.data.create_transform(**data_config, is_training=False)

image = Image.open("your_image.jpg").convert('RGB')
input_tensor = transforms(image).unsqueeze(0)

with torch.no_grad():
    output = model(input_tensor)
    probabilities = torch.nn.functional.softmax(output[0], dim=0)
    
print(f"Top class index: {probabilities.argmax().item()}")
```

## Training Details
- **Training samples**: 10,000
- **Validation samples**: 1,000
- **Epochs**: 20
- **Optimizer**: AdamW
- **Learning Rate**: 0.001
- **Batch Size**: 640 (Effective)

## Citation

```bibtex
@article{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}
```