YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
run phi3-mini on AMD NPU
- If no
phi3_mini_awq_4bit_no_flash_attention.pt, use awq quantization to get the quantization model. - Put modeling_phi3.py in this repo into the phi-3-mini folder.
- Modify the file path in the run_awq.py
- run
python run_awq.py --task decode --target aie --w_bit 4
reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers
As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183
PS: The performance is similar to that on CPU(7640hs).
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support