whitphx HF Staff commited on Aug 18

Commit

c1ddd58

verified ·

1 Parent(s): b50e2a2

Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ✅ `int8`: `encoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ❌ Based on `decoder_model_merged.onnx` *with* slimming

**The base model `decoder_model_merged.onnx` has been renamed to `model.onnx`.**

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpq5j0xlcb/model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/6 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.960464477539063e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.960464477539063e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/6 [00:04<?, ?it/s]

Processing /tmp/tmpq5j0xlcb/model.onnx: 0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/cQ6A7vyzEBQhtbSuz6CnD/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

**The base model `decoder_model_merged.onnx` has been renamed to `model.onnx`.**

↳ ✅ `fp16`: `model_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `model_int8.onnx` (added)
↳ ✅ `uint8`: `model_uint8.onnx` (added)
↳ ✅ `q4`: `model_q4.onnx` (added)
↳ ✅ `q4f16`: `model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `model_bnb4.onnx` (added)

Files changed (25) hide show

README.md +18 -0
onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/decoder_with_past_model_bnb4.onnx +3 -0
onnx/decoder_with_past_model_fp16.onnx +3 -0
onnx/decoder_with_past_model_int8.onnx +3 -0
onnx/decoder_with_past_model_q4.onnx +3 -0
onnx/decoder_with_past_model_q4f16.onnx +3 -0
onnx/decoder_with_past_model_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_int8.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0
onnx/model.onnx +3 -0
onnx/model_bnb4.onnx +3 -0
onnx/model_fp16.onnx +3 -0
onnx/model_int8.onnx +3 -0
onnx/model_q4.onnx +3 -0
onnx/model_q4f16.onnx +3 -0
onnx/model_uint8.onnx +3 -0

README.md CHANGED Viewed

@@ -5,4 +5,22 @@ library_name: transformers.js
 https://huggingface.co/facebook/blenderbot_small-90M with ONNX weights to be compatible with Transformers.js.
 Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).

 https://huggingface.co/facebook/blenderbot_small-90M with ONNX weights to be compatible with Transformers.js.
+## Usage (Transformers.js)
+If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
+```bash
+npm i @huggingface/transformers
+```
+**Example:** Text-to-text generation.
+```js
+import { pipeline } from '@huggingface/transformers';
+const generator = await pipeline('text2text-generation', 'Xenova/blenderbot_small-90M');
+const output = await generator('how can I become more healthy?', {
+  max_new_tokens: 100,
+});
+```
 Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).

onnx/decoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9629a0277d892ee0d44e601fe5656a361882299c7c3375d0735e2bf9f7bae19
+size 133130952

onnx/decoder_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:55a1a66e6e958025e936955737cbbbafdb0bfed5e4c1fda72ebbd05d338fdf75
+size 124410262

onnx/decoder_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f4608c2686d2567e64fd17c5480fd4f4ee9b18864192f64bc252941f89b4f10
+size 175240929

onnx/decoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15a6e6a016a7580bcd92a221160d2b3db7abe4650691563b2ac30a73bede8e24
+size 135227464

onnx/decoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e26c97a9173d553ffce824d7c67f7ddfb983ffaf736cf0c779297be5cdb07477
+size 76187689

onnx/decoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c554aff7ee9167bfd37a8399bf34781a6979c084b67f4d8e8056763ddb9e1044
+size 175240970

onnx/decoder_with_past_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1fe0bb0fef0589ac0d28b5aea839d6bbac9ddda06165d20d3942683f40d71251
+size 130648184

onnx/decoder_with_past_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:919c1b325341547e03b232144c011efe53e845c6f83ad606c7975acfadc4fa4f
+size 115923394

onnx/decoder_with_past_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c593ffd13cb99333aa424d65bd7cf60a2e7fc63dcb46e1cbd85b8ccf15ee0ab
+size 170909844

onnx/decoder_with_past_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c91081af0f840045c087497572fa125019c3140115210896e2c819fd9dff991
+size 132482680

onnx/decoder_with_past_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01e3efc7406c68a12d2b1808484da34d10b1daba3682c4535dfceedecc7d2644
+size 73727749

onnx/decoder_with_past_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eeecade420d70c23382269f7f19ff3cd26ee0b2747580758de34c893b52904b5
+size 170909876

onnx/encoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f5ec9dfed30d0e9888d45d662ed5d931703ba4388c4311daaed93bf423980e8
+size 128111905

onnx/encoder_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17ff45c9d82e64ce35d2dfaba83218034e8783a740f5b5ee0fa89c3834a8dc8b
+size 53983283

onnx/encoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33f32a35aa449476e09f4b4ffc90101ef2c9e34a5ac17ec9cc18c94c50bf9f89
+size 129684385

onnx/encoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c472abf9e8dc2a7c4271a7dcc44abf70866e6b82937dc26b5b4cb2d163604338
+size 71216415

onnx/encoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bfac525f08c0982f801507c6a9d9e3933a0d25f0d3c0971f05c3b25ca9799b62
+size 53983317

onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e7385d8a799449515b71ac64ed0f9e4c508430299e363f838ea026252a913977
+size 249023782

onnx/model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2f0c1184607dd15c4ac6a967499bbe8ebd4697bdbf378488840e73926f3df45
+size 133702361

onnx/model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9174a4cf48664080d7551dec55138b85dcb5a70517a9ce8186adf9092eedf138
+size 125103397

onnx/model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db92e37da4783df41535ed6c2c0eda059c6771c48a8f03ffd1b1cd79a52b2a7a
+size 63583612

onnx/model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a8b3c8410c76809186f9b2d0bc7c3fea7fc6eff36ab765df07b8e8f079ed83c
+size 135798297

onnx/model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48f14509584a92b546c3d06ef5d91efd210ee8402418ac08760a312d1e0f186d
+size 76878792

onnx/model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac2d7fa065dda4408ba19a04cc91b35004c331e1b334f0b2e34563b35eadedb9
+size 63583653