File size: 5,336 Bytes
509825a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
language: en
license: apache-2.0
model_name: yolov3-12-int8.onnx
tags:
- validated
- vision
- object_detection_segmentation
- yolov3
---
<!--- SPDX-License-Identifier: MIT -->
# YOLOv3
## Description
This model is a neural network for real-time object detection that detects 80 different classes. It is very fast and accurate.
## Model
|Model |Download |Download (with sample test data)|ONNX version|Opset version|Accuracy |
|-------------|:--------------|:--------------|:--------------|:--------------|:--------------|
|YOLOv3 |[237 MB](model/yolov3-10.onnx) |[222 MB](model/yolov3-10.tar.gz)|1.5 |10 |mAP of 0.553 |
|YOLOv3-12 |[237 MB](model/yolov3-12.onnx) |[222 MB](model/yolov3-12.tar.gz)|1.9 |12 |mAP of 0.2874 |
|YOLOv3-12-int8 |[60 MB](model/yolov3-12-int8.onnx) |[53 MB](model/yolov3-12-int8.tar.gz)|1.9 |12 |mAP of 0.2693 |
> Compared with the YOLOv3-12, YOLOv3-12-int8's mAP decline is 0.0181 and performance improvement is 2.19x.
>
> Note the performance depends on the test hardware.
>
> Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
<hr>
## Inference
### Input to model
Resized image `(1x3x416x416)`
Original image size `(1x2)` which is `[image.size[1], image.size[0]]`
### Preprocessing steps
The images have to be loaded in to a range of [0, 1]. The transformation should preferrably happen at preprocessing.
The following code shows how to preprocess a NCHW tensor:
```python
import numpy as np
from PIL import Image
# this function is from yolo3.utils.letterbox_image
def letterbox_image(image, size):
'''resize image with unchanged aspect ratio using padding'''
iw, ih = image.size
w, h = size
scale = min(w/iw, h/ih)
nw = int(iw*scale)
nh = int(ih*scale)
image = image.resize((nw,nh), Image.BICUBIC)
new_image = Image.new('RGB', size, (128,128,128))
new_image.paste(image, ((w-nw)//2, (h-nh)//2))
return new_image
def preprocess(img):
model_image_size = (416, 416)
boxed_image = letterbox_image(img, tuple(reversed(model_image_size)))
image_data = np.array(boxed_image, dtype='float32')
image_data /= 255.
image_data = np.transpose(image_data, [2, 0, 1])
image_data = np.expand_dims(image_data, 0)
return image_data
image = Image.open(img_path)
# input
image_data = preprocess(image)
image_size = np.array([image.size[1], image.size[0]], dtype=np.int32).reshape(1, 2)
```
### Output of model
The model has 3 outputs.
boxes: `(1x'n_candidates'x4)`, the coordinates of all anchor boxes,
scores: `(1x80x'n_candidates')`, the scores of all anchor boxes per class,
indices: `('nbox'x3)`, selected indices from the boxes tensor. The selected index format is (batch_index, class_index, box_index). The class list is [here](https://github.com/qqwweee/keras-yolo3/blob/master/model_data/coco_classes.txt)
### Postprocessing steps
Post processing and meaning of output
```
out_boxes, out_scores, out_classes = [], [], []
for idx_ in indices:
out_classes.append(idx_[1])
out_scores.append(scores[tuple(idx_)])
idx_1 = (idx_[0], idx_[2])
out_boxes.append(boxes[idx_1])
```
out_boxes, out_scores, out_classes are list of resulting boxes, scores, and classes.
<hr>
## Dataset (Train and validation)
We use pretrained weights from pjreddie.com [here](https://pjreddie.com/media/files/yolov3.weights).
<hr>
## Validation accuracy
YOLOv3:
Metric is COCO box mAP (averaged over IoU of 0.5:0.95), computed over 2017 COCO val data.
mAP of 0.553 based on original Yolov3 model [here](https://pjreddie.com/darknet/yolo/)
YOLOv3-12 & YOLOv3-12-int8:
Metric is COCO box mAP@[IoU=0.50:0.95 | area=all | maxDets=100], computed over 2017 COCO val data.
<hr>
## Quantization
YOLOv3-12-int8 is obtained by quantizing YOLOv3-12 model. We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/object_detection/onnx_model_zoo/yolov3/quantization/ptq/README.md) to understand how to use Intel® Neural Compressor for quantization.
### Environment
onnx: 1.9.0
onnxruntime: 1.10.0
### Prepare model
```shell
wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/yolov3/model/yolov3-12.onnx
```
### Model quantize
```bash
bash run_tuning.sh --input_model=path/to/model \ # model path as *.onnx
--config=yolov3.yaml \
--data_path=path/to/COCO2017 \
--output_model=path/to/save
```
<hr>
## Publication/Attribution
Joseph Redmon, Ali Farhadi. YOLOv3: An Incremental Improvement, [paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf)
<hr>
## References
* This model is converted from a keras model [repository](https://github.com/qqwweee/keras-yolo3) using keras2onnx converter [repository](https://github.com/onnx/keras-onnx).
* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
<hr>
## Contributors
* [mengniwang95](https://github.com/mengniwang95) (Intel)
* [airMeng](https://github.com/airMeng) (Intel)
* [ftian1](https://github.com/ftian1) (Intel)
* [hshen14](https://github.com/hshen14) (Intel)
<hr>
## License
MIT License
<hr>
|