Text Inputs
Is there any reason text inputs aren't enabled? For certain all-metal textures it fails to identify metallicity an it seems like being able to prompt it out would help.
Hello,
Thank you for raising this question. The idea of using text prompts to enhance SVBRDF channel estimation quality is indeed interesting.
Our material estimation architecture is inspired by RGB$\leftrightarrow$X, which uses static text prompts as switches to define the output channel.
We actually conducted experiments adding keywords (e.g., "metal"/"non-metal") to the text prompt switch, but we found that this did not improve the estimation results.
We believe the reason for this is that our task and training setting at the estimation stage focus on pure image intrinsic decomposition. Consequently, the text prompt embeddings largely overfit and collapse to single points, meaning the model loses its ability to process text modality effectively.
To successfully re-integrate language understanding, we would need to set up a true multi-modal architecture and training scheme. However, this approach currently faces significant challenges, such as the scarcity of high-quality PBR material data.
I hope that answers your question.
That makes perfect sense, I apologize I must've misunderstood the paper (this is not my area of expertise). Thanks for the thorough explanation.
@ksangk I think fine-tuning would solve the problem. Is it possible to release code to train the model?