Some tags appear to be overfitted

#4
by Zeyzilla02 - opened

This is a great model, but the probability of occurrence of words such as "looking at viewer" and "solo" seems to be too high. Even a random picture of a hamburger (no human) has a threshold of 0.4.

This phenomenon is not caused by overfitting, but is essentially a case of underfitting. Inference for images in an untrained domain simply estimates the probability according to the frequency of that tag's existence in the dataset—it's like you choosing an answer a question you don't know using your experiences on a exam.

A simple trick to improve this is to use asymmetric loss, which I am already utilizing in the training of this model. This has the effect of suppressing the backpropagation for extremely high predictions (which arise from frequency-based inference), but its effectiveness is actually limited.

To rigorously correct this, I need to manually tag images from less common domains and include them in the dataset for training. Since the dataset has over 1 million data points, considering the dataset scale, I think images from the untrained domain on the order of 10,000 or more are required.

I would be grateful if you could cooperate in providing accurately tagged image-tag datasets.

Sign up or log in to comment