SigLIP (shape-optimized model)
SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.
The Original repo is https://huggingface.co/google/siglip-so400m-patch14-384.
This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through
The repo of AXera Platform, which you can get the detial of guide
Support Platform
| Models | Raspberry Pi5 Only CPU | Intel i7-13700 | Raspberry Pi5 + M.2 Card |
|---|---|---|---|
| Image Encoder | 8.3 s | 1.2 s | 0.19 s |
| Text Encoder | 1.3 s | 0.3 s | 0.05 s |
How to use
Download all files from this repository to the device
(axcl) axera@raspberrypi:~/samples/siglip $ tree -L 2
.
├── 000000039769.jpg
├── ax650
│ ├── siglip_text_u16.axmodel
│ └── siglip_vision_u16_fcu8.axmodel
├── config.json
├── onnx
│ ├── siglip-so400m-patch14-384_text.onnx
│ └── siglip-so400m-patch14-384_vision.onnx
├── python
│ ├── inference_axmodel.py
│ ├── inference_onnx.py
│ └── requirements.txt
└── tokenizer
├── config.json
├── preprocessor_config.json
├── special_tokens_map.json
├── spiece.model
├── tokenizer_config.json
└── tokenizer.json
5 directories, 15 files
python env requirement
pyaxengine
https://github.com/AXERA-TECH/pyaxengine
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl
others
pip install -r python/requirements.txt
Inputs
Test
"a photo of 2 cats", "a photo of 2 dogs"
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)
root@ax650:/mnt/qtang/inner/SigLIP.axera# python3 python/inference_axmodel.py
[INFO] Available providers: ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.7.2a
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.86 seconds
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.22 seconds
Total model loading time: 7.08 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
49.4% that image 0 is 'a photo of 2 cats'
root@ax650:/mnt/qtang/inner/SigLIP.axera#
Inference with M.2 Accelerator card
What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.
(axcl) axera@raspberrypi:~/samples/siglip $ python python/inference_axmodel.py
[INFO] Available providers: ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.31 seconds
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.37 seconds
Total model loading time: 24.68 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
52.5% that image 0 is 'a photo of 2 cats'
(axcl) axera@raspberrypi:~/samples/siglip $
- Downloads last month
- -
Model tree for AXERA-TECH/siglip-so400m-patch14-384
Base model
google/siglip-so400m-patch14-384