Text Inputs

by LiamMello - opened 18 days ago

18 days ago

Is there any reason text inputs aren't enabled? For certain all-metal textures it fails to identify metallicity an it seems like being able to prompt it out would help.

ksangk

Ubisoft org 18 days ago

Hello,

Thank you for raising this question. The idea of using text prompts to enhance SVBRDF channel estimation quality is indeed interesting.

Our material estimation architecture is inspired by RGB$\leftrightarrow$X, which uses static text prompts as switches to define the output channel.
We actually conducted experiments adding keywords (e.g., "metal"/"non-metal") to the text prompt switch, but we found that this did not improve the estimation results.

We believe the reason for this is that our task and training setting at the estimation stage focus on pure image intrinsic decomposition. Consequently, the text prompt embeddings largely overfit and collapse to single points, meaning the model loses its ability to process text modality effectively.

To successfully re-integrate language understanding, we would need to set up a true multi-modal architecture and training scheme. However, this approach currently faces significant challenges, such as the scarcity of high-quality PBR material data.

I hope that answers your question.

LiamMello

18 days ago

That makes perfect sense, I apologize I must've misunderstood the paper (this is not my area of expertise). Thanks for the thorough explanation.

qpqpqpqpqpqp

17 days ago

@ksangk I think fine-tuning would solve the problem. Is it possible to release code to train the model?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment