Intel
/

distilbert-base-uncased-finetuned-sst-2-english-int8-static-inc

Text Classification

text-classfication

neural-compressor

Intel® Neural Compressor

PostTrainingStatic

Model card Files Files and versions

distilbert-base-uncased-finetuned-sst-2-english-int8-static-inc / README.md

xinhe's picture

Create README.md

296f7bb over 3 years ago

|

1.16 kB

	---
	language: en
	license: apache-2.0
	tags: text-classfication
	datasets:
	- sst2
	---

	INT8 DistilBERT base uncased finetuned SST-2 (Post-training static quantization)
	===
	This is an INT8 PyTorch model quantized by [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

	Test result below comes from [AWS](https://aws.amazon.com/) c6i.xlarge (intel ice lake: 4 vCPUs, 8g Memory) instance.

	\| \|fp32\|int8\|
	\|---\|:---:\|:---:\|
	\| Accuracy \|0.9106\|0.9037\|
	\| Throughput (samples/sec) \|？\|？\|
	\| Model size (MB) \|255\|66\|


	Load with optimum:
	```python
	from nlp_toolkit import OptimizedModel
	int8_model = OptimizedModel.from_pretrained(
	'intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
	)
	```
	Notes:
	- The INT8 model has better performance than the FP32 model when the CPU is fully loaded. Otherwise, there will be the illusion that INT8 is inferior to FP32.