runtime error

Exit code: 1. Reason: pecial_tokens_map.json: 100%|██████████| 491/491 [00:00<00:00, 3.75MB/s] tokenizer.json: 0%| | 0.00/11.4M [00:00<?, ?B/s] tokenizer.json: 100%|██████████| 11.4M/11.4M [00:00<00:00, 32.2MB/s] tokenizer_config.json: 0%| | 0.00/7.36k [00:00<?, ?B/s] tokenizer_config.json: 100%|██████████| 7.36k/7.36k [00:00<00:00, 38.3MB/s] vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s] vocab.json: 100%|██████████| 2.78M/2.78M [00:00<00:00, 31.4MB/s] INFO Loader: Auto dtype: `torch.float16` due to inference mode. If you wish to use `bfloat16`, please pass in `torch_dtype` arg to `loader()`. INFO Estimated Quantization BPW (bits per weight): 4.2875 bpw, based on [bits: 4, group_size: 128] Traceback (most recent call last): File "/home/user/app/app.py", line 33, in <module> model = GPTQModel.load(model_name, device='cuda', trust_remote_code=True) File "/usr/local/lib/python3.10/site-packages/gptqmodel/models/auto.py", line 237, in load return cls.from_quantized( File "/usr/local/lib/python3.10/site-packages/gptqmodel/models/auto.py", line 307, in from_quantized return MODEL_MAP[model_type].from_quantized( File "/usr/local/lib/python3.10/site-packages/gptqmodel/models/loader.py", line 431, in from_quantized model = cls.loader.from_config( File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 437, in from_config return model_class._from_config(config, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1544, in _from_config model = cls(config, **kwargs) File "/home/user/.cache/huggingface/modules/transformers_modules/d16bdfed736b2ff09b452723d5f3a0b3c7254f7a/modeling_ovis.py", line 293, in __init__ version.parse(importlib.metadata.version("flash_attn")) >= version.parse("2.6.3")), \ AssertionError: Using `flash_attention_2` requires having `flash_attn>=2.6.3` installed.

Container logs:

Fetching error logs...