Update README.md
Browse files
README.md
CHANGED
|
@@ -17,6 +17,12 @@ inference: false
|
|
| 17 |
## Model Summary
|
| 18 |
This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## Usage
|
| 21 |
|
| 22 |
### Installation and Setup
|
|
@@ -76,12 +82,6 @@ python phi3-qa.py -m .\gemma-2b-it-onnx
|
|
| 76 |
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
|
| 77 |
- **CPU:** AMD Ryzen CPU
|
| 78 |
|
| 79 |
-
## ONNX Models
|
| 80 |
-
|
| 81 |
-
Here are some of the optimized configurations we have added:
|
| 82 |
-
- **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
|
| 83 |
-
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
|
| 84 |
-
|
| 85 |
## Resources and Technical Documentation
|
| 86 |
|
| 87 |
- [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
|
|
|
|
| 17 |
## Model Summary
|
| 18 |
This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
|
| 19 |
|
| 20 |
+
## ONNX Models
|
| 21 |
+
|
| 22 |
+
Here are some of the optimized configurations we have added:
|
| 23 |
+
- **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
|
| 24 |
+
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
|
| 25 |
+
|
| 26 |
## Usage
|
| 27 |
|
| 28 |
### Installation and Setup
|
|
|
|
| 82 |
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
|
| 83 |
- **CPU:** AMD Ryzen CPU
|
| 84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
## Resources and Technical Documentation
|
| 86 |
|
| 87 |
- [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
|