Spaces:
Running
on
Zero
Running
on
Zero
feat(optim): load the model and tokenizer outside of the spaces wrapped method
#3
by
raphael-gl
HF Staff
- opened
On one side we lose the lazy init, but we benefit from the tensor packing on zero, so the model has a smaller memory footprint when idle. Besides, this way, callers do not consume their gpu quota to actually load the model. It is already downloaded, loaded in memory and prepared for serving