mcity-data-engine
/

fisheye8k_facebook_deformable-detr-detic

@@ -1,36 +1,99 @@
 ---
 library_name: transformers
 license: apache-2.0
-base_model: facebook/deformable-detr-detic
 tags:
 - generated_from_trainer
-datasets:
-- Voxel51/fisheye8k
 model-index:
 - name: fisheye8k_facebook_deformable-detr-detic
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # fisheye8k_facebook_deformable-detr-detic
-This model is a fine-tuned version of [facebook/deformable-detr-detic](https://huggingface.co/facebook/deformable-detr-detic) on the generator dataset.
-It achieves the following results on the evaluation set:
 - Loss: 2.1348
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -49,23 +112,23 @@ The following hyperparameters were used during training:
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 2.435         | 1.0   | 5288  | 2.4832          |
-| 2.2626        | 2.0   | 10576 | 2.6324          |
-| 1.8443        | 3.0   | 15864 | 2.1361          |
-| 2.4834        | 4.0   | 21152 | 2.5269          |
-| 2.3417        | 5.0   | 26440 | 2.5997          |
-| 1.939         | 6.0   | 31728 | 2.1948          |
-| 1.8384        | 7.0   | 37016 | 2.0057          |
-| 1.7235        | 8.0   | 42304 | 2.0182          |
-| 1.728         | 9.0   | 47592 | 1.9454          |
-| 1.621         | 10.0  | 52880 | 1.9876          |
-| 1.539         | 11.0  | 58168 | 1.8862          |
-| 1.7229        | 12.0  | 63456 | 2.2071          |
-| 1.9613        | 13.0  | 68744 | 2.5147          |
-| 1.5238        | 14.0  | 74032 | 1.9836          |
-| 1.5777        | 15.0  | 79320 | 2.0812          |
-| 1.5963        | 16.0  | 84608 | 2.1348          |
 ### Framework versions
@@ -75,4 +138,25 @@ The following hyperparameters were used during training:
 - Datasets 3.2.0
 - Tokenizers 0.21.0
-Mcity Data Engine: https://arxiv.org/abs/2504.21614

 ---
+base_model: facebook/deformable-detr-detic
+datasets:
+- Voxel51/fisheye8k
 library_name: transformers
 license: apache-2.0
 tags:
 - generated_from_trainer
+- object-detection
+- zero-shot
+pipeline_tag: zero-shot-object-detection
 model-index:
 - name: fisheye8k_facebook_deformable-detr-detic
   results: []
 ---
 # fisheye8k_facebook_deformable-detr-detic
+This model is a fine-tuned version of [facebook/deformable-detr-detic](https://huggingface.co/facebook/deformable-detr-detic) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k), developed as part of the **Mcity Data Engine** project.
+The model achieves the following results on the evaluation set:
 - Loss: 2.1348
+📚 Paper: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
+🌐 Project Page: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
+💻 Code: [mcity/mcity_data_engine GitHub Repository](https://github.com/mcity/mcity_data_engine)
 ## Model description
+The `fisheye8k_facebook_deformable-detr-detic` model is a component of the **Mcity Data Engine**, an open-source system designed for iterative model improvement through open-vocabulary data selection. This engine provides modules for the complete data-based development cycle, from data acquisition to model deployment, specifically addressing the challenge of detecting long-tail and novel classes in large amounts of unlabeled data, particularly in Intelligent Transportation Systems (ITS).
+This model leverages its base, `deformable-detr-detic`, to specialize in object detection within the context of the Mcity Data Engine's workflows, enabling the identification of objects even for classes not seen during its initial fine-tuning, through its open-vocabulary capabilities.
 ## Intended uses & limitations
+This model is primarily intended for **zero-shot object detection** within Intelligent Transportation Systems (ITS) and related domains. It is designed to assist researchers and practitioners in identifying rare and novel classes of interest from raw visual data, facilitating the continuous improvement of AI models.
+**Potential use cases include:**
+*   Detecting various types of road users and vehicles in complex traffic scenarios.
+*   Identifying long-tail or previously unseen objects in automotive perception datasets.
+*   Serving as a component within larger data curation and model training pipelines.
+**Limitations:**
+*   While designed for generalization through its open-vocabulary approach, performance on highly out-of-distribution scenarios might still vary.
+*   Optimal utilization of the Mcity Data Engine workflows, including those leveraging this model, often requires a powerful GPU.
+*   The model's performance on standard perspective images may differ, as it was fine-tuned on fisheye camera data.
+## Sample Usage
+You can use this model with the Hugging Face `transformers` library to perform zero-shot object detection. This example demonstrates how to detect objects using descriptive text queries.
+```python
+from transformers import AutoProcessor, AutoModelForObjectDetection
+import torch
+from PIL import Image
+import requests
+# Load an example image. For best results, use images similar to the Fisheye8K dataset.
+# This example uses a general image, but real-world usage should focus on ITS contexts.
+url = "http://images.cocodataset.org/val2017/000000039769.jpg" # Example image (cat, dog, couch)
+image = Image.open(requests.get(url, stream=True).raw)
+# Load processor and model from the Hugging Face Hub
+model_id = "mcity-data-engine/fisheye8k_facebook_deformable-detr-detic"
+processor = AutoProcessor.from_pretrained(model_id)
+model = AutoModelForObjectDetection.from_pretrained(model_id)
+# Define target labels/text queries. While trained on specific classes (Bus, Bike, Car, Pedestrian, Truck),
+# the model can generalize to other categories thanks to its open-vocabulary nature.
+text_queries = ["a bus on the road", "a bicycle", "a car", "a pedestrian crossing", "a truck"]
+# Prepare inputs
+inputs = processor(images=image, text=text_queries, return_tensors="pt")
+# Perform inference
+with torch.no_grad():
+    outputs = model(**inputs)
+# Post-process outputs to get detected objects
+target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
+results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.5)[0]
+# Print detected objects
+print(f"Detected objects in the image (threshold=0.5):")
+for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
+    print(
+        f"  {text_queries[label.item()]}: {round(score.item(), 3)}
+"
+        f"  [xmin={round(box[0].item(), 2)}, ymin={round(box[1].item(), 2)}, "
+        f"xmax={round(box[2].item(), 2)}, ymax={round(box[3].item(), 2)}]"
+    )
+```
 ## Training and evaluation data
+This model was fine-tuned on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). The Mcity Data Engine leverages data selection processes to focus on detecting long-tail and novel classes, which is crucial for ITS applications. The model's `config.json` indicates it was fine-tuned for classes such as "Bus", "Bike", "Car", "Pedestrian", and "Truck".
 ## Training procedure
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|\
+| 2.435         | 1.0   | 5288  | 2.4832          |\
+| 2.2626        | 2.0   | 10576 | 2.6324          |\
+| 1.8443        | 3.0   | 15864 | 2.1361          |\
+| 2.4834        | 4.0   | 21152 | 2.5269          |\
+| 2.3417        | 5.0   | 26440 | 2.5997          |\
+| 1.939         | 6.0   | 31728 | 2.1948          |\
+| 1.8384        | 7.0   | 37016 | 2.0057          |\
+| 1.7235        | 8.0   | 42304 | 2.0182          |\
+| 1.728         | 9.0   | 47592 | 1.9454          |\
+| 1.621         | 10.0  | 52880 | 1.9876          |\
+| 1.539         | 11.0  | 58168 | 1.8862          |\
+| 1.7229        | 12.0  | 63456 | 2.2071          |\
+| 1.9613        | 13.0  | 68744 | 2.5147          |\
+| 1.5238        | 14.0  | 74032 | 1.9836          |\
+| 1.5777        | 15.0  | 79320 | 2.0812          |\
+| 1.5963        | 16.0  | 84608 | 2.1348          |\
 ### Framework versions
 - Datasets 3.2.0
 - Tokenizers 0.21.0
+## Acknowledgements
+Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
+Special thanks to these amazing people for contributing to the Mcity Data Engine! 🙌
+<a href="https://github.com/mcity/mcity_data_engine/graphs/contributors">
+  <img src="https://contrib.rocks/image?repo=mcity/mcity_data_engine" />
+</a>
+## Citation
+If you use the Mcity Data Engine in your research, feel free to cite the project:
+```bibtex
+@article{bogdoll2025mcitydataengine,
+  title={Mcity Data Engine},
+  author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
+  journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
+  year={2025}
+}
+```