DeepSeek-OCR-DEMO

Running on Zero

App Files Files Community

Reduce VRAM consumption by swapping `cuda()` and `to(torch.bfloat16)`

by mingyi456 - opened 30 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-1

mingyi456

30 days ago

When I test the code locally, it appears that converting the weights to bfloat16 only after moving to the GPU causes the excess VRAM to not be freed up (unless maybe torch.cuda.empty_cache() is used, but this is simpler).

Reduce VRAM consumption by swapping `cuda()` and `to(torch.bfloat16)`62ee1bba

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment