Latency Issues

#2
by memorylane - opened

Wanted to check with you on the expectation on latency

I've setup the 3b model, and consistently i've been getting 4 second (+- 0.2 seconds). I've deployed it on a top end GPU cluster.

On a very select few images, it takes 1 second.

What is the expected latency on your end when you were conducting internal testing? And do you have any advice to improve the latency?

Sign up or log in to comment