FastAPI space

#13
by loleg - opened

I've started an open source Python app to expose a basic completions interface, uptime and potentially other ways of interacting with Apertus in a REST API here:
https://huggingface.co/spaces/loleg/fastapi-apertus

There is a mirror with open issues on Codeberg:
https://codeberg.org/loleg/fastapi-apertus/

For hosting on Hugging Face, a minimum of 1xL4 (30GB RAM, 24GB VRAM) is suggested, even though I've put in some memory-saving measures and 16GB VRAM should be enough. The same applies to running the model directly in an Inference Endpoint, where I am still struggling to get Apertus working (but am getting helpful support from HF - stay tuned).

Feedback and pull requests very much appreciated!

Sign up or log in to comment