Croissant checker hanging while local validation passes

#2
by SingularityHJY - opened

Hi, I am trying to check the croissant format of our dataset at: https://huggingface.co/datasets/cx-cmu/ClueWeb-Reco

Submitting the generated croissant file / the croissant API URL just hangs forever, while local tests passes very quickly as in the following:

$ mlcroissant validate --jsonld  https://huggingface.co/api/datasets/cx-cmu/ClueWeb-Reco/croissant

W0511 09:31:53.784890 22735924921344 rdf.py:80] WARNING: The JSON-LD `@context` is not standard. Refer to the official @context (e.g., from the example datasets in https://github.com/mlcommons/croissant/tree/main/datasets/1.0). The different keys are: {'examples', 'rai'}
W0511 09:31:53.813600 22735924921344 datasets.py:41] Found the following 3 warning(s) during the validation:
  -  [Metadata(ClueWeb-Reco)] Property "http://mlcommons.org/croissant/citeAs" is recommended, but does not exist.
  -  [Metadata(ClueWeb-Reco)] Property "https://schema.org/datePublished" is recommended, but does not exist.
  -  [Metadata(ClueWeb-Reco)] Property "https://schema.org/version" is recommended, but does not exist.
I0511 09:31:53.813673 22735924921344 validate.py:53] Done.

Are there any suggestions towards this situation or ideas towards why the checker hangs while local validation responds so fast?
I am not sure what the problem is with the croissant.
Thank you so much!

same problem

Same problem.

Same problem

Seems like we're hitting a rate limit. Talking to HuggingFace now.
If running locally passes the test, you will be fine.

still having a problem

Yes, many people are using the checker at the same time. If you have a HuggingFace dataset, it is possible that it hangs because of rate limiting.
We added several options for to check it locally or by duplicating the app. Please try those if you have a problem.

Sign up or log in to comment