Latency Issues
#2
by
memorylane
- opened
Wanted to check with you on the expectation on latency
I've setup the 3b model, and consistently i've been getting 4 second (+- 0.2 seconds). I've deployed it on a top end GPU cluster.
On a very select few images, it takes 1 second.
What is the expected latency on your end when you were conducting internal testing? And do you have any advice to improve the latency?