Croissant checker hanging while local validation passes
Hi, I am trying to check the croissant format of our dataset at: https://huggingface.co/datasets/cx-cmu/ClueWeb-Reco
Submitting the generated croissant file / the croissant API URL just hangs forever, while local tests passes very quickly as in the following:
$ mlcroissant validate --jsonld https://huggingface.co/api/datasets/cx-cmu/ClueWeb-Reco/croissant
W0511 09:31:53.784890 22735924921344 rdf.py:80] WARNING: The JSON-LD `@context` is not standard. Refer to the official @context (e.g., from the example datasets in https://github.com/mlcommons/croissant/tree/main/datasets/1.0). The different keys are: {'examples', 'rai'}
W0511 09:31:53.813600 22735924921344 datasets.py:41] Found the following 3 warning(s) during the validation:
- [Metadata(ClueWeb-Reco)] Property "http://mlcommons.org/croissant/citeAs" is recommended, but does not exist.
- [Metadata(ClueWeb-Reco)] Property "https://schema.org/datePublished" is recommended, but does not exist.
- [Metadata(ClueWeb-Reco)] Property "https://schema.org/version" is recommended, but does not exist.
I0511 09:31:53.813673 22735924921344 validate.py:53] Done.
Are there any suggestions towards this situation or ideas towards why the checker hangs while local validation responds so fast?
I am not sure what the problem is with the croissant.
Thank you so much!
same problem
Same problem.
Same problem
Seems like we're hitting a rate limit. Talking to HuggingFace now.
If running locally passes the test, you will be fine.
still having a problem
Yes, many people are using the checker at the same time. If you have a HuggingFace dataset, it is possible that it hangs because of rate limiting.
We added several options for to check it locally or by duplicating the app. Please try those if you have a problem.