Improve model card with pipeline tag, links, sample usage, and expanded details

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +113 -29
README.md CHANGED
@@ -1,36 +1,99 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  license: apache-2.0
4
- base_model: facebook/deformable-detr-detic
5
  tags:
6
  - generated_from_trainer
7
- datasets:
8
- - Voxel51/fisheye8k
 
9
  model-index:
10
  - name: fisheye8k_facebook_deformable-detr-detic
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # fisheye8k_facebook_deformable-detr-detic
18
 
19
- This model is a fine-tuned version of [facebook/deformable-detr-detic](https://huggingface.co/facebook/deformable-detr-detic) on the generator dataset.
20
- It achieves the following results on the evaluation set:
 
21
  - Loss: 2.1348
22
 
 
 
 
 
23
  ## Model description
24
 
25
- More information needed
 
 
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
@@ -49,23 +112,23 @@ The following hyperparameters were used during training:
49
  ### Training results
50
 
51
  | Training Loss | Epoch | Step | Validation Loss |
52
- |:-------------:|:-----:|:-----:|:---------------:|
53
- | 2.435 | 1.0 | 5288 | 2.4832 |
54
- | 2.2626 | 2.0 | 10576 | 2.6324 |
55
- | 1.8443 | 3.0 | 15864 | 2.1361 |
56
- | 2.4834 | 4.0 | 21152 | 2.5269 |
57
- | 2.3417 | 5.0 | 26440 | 2.5997 |
58
- | 1.939 | 6.0 | 31728 | 2.1948 |
59
- | 1.8384 | 7.0 | 37016 | 2.0057 |
60
- | 1.7235 | 8.0 | 42304 | 2.0182 |
61
- | 1.728 | 9.0 | 47592 | 1.9454 |
62
- | 1.621 | 10.0 | 52880 | 1.9876 |
63
- | 1.539 | 11.0 | 58168 | 1.8862 |
64
- | 1.7229 | 12.0 | 63456 | 2.2071 |
65
- | 1.9613 | 13.0 | 68744 | 2.5147 |
66
- | 1.5238 | 14.0 | 74032 | 1.9836 |
67
- | 1.5777 | 15.0 | 79320 | 2.0812 |
68
- | 1.5963 | 16.0 | 84608 | 2.1348 |
69
 
70
 
71
  ### Framework versions
@@ -75,4 +138,25 @@ The following hyperparameters were used during training:
75
  - Datasets 3.2.0
76
  - Tokenizers 0.21.0
77
 
78
- Mcity Data Engine: https://arxiv.org/abs/2504.21614
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: facebook/deformable-detr-detic
3
+ datasets:
4
+ - Voxel51/fisheye8k
5
  library_name: transformers
6
  license: apache-2.0
 
7
  tags:
8
  - generated_from_trainer
9
+ - object-detection
10
+ - zero-shot
11
+ pipeline_tag: zero-shot-object-detection
12
  model-index:
13
  - name: fisheye8k_facebook_deformable-detr-detic
14
  results: []
15
  ---
16
 
 
 
 
17
  # fisheye8k_facebook_deformable-detr-detic
18
 
19
+ This model is a fine-tuned version of [facebook/deformable-detr-detic](https://huggingface.co/facebook/deformable-detr-detic) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k), developed as part of the **Mcity Data Engine** project.
20
+
21
+ The model achieves the following results on the evaluation set:
22
  - Loss: 2.1348
23
 
24
+ 📚 Paper: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
25
+ 🌐 Project Page: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
26
+ 💻 Code: [mcity/mcity_data_engine GitHub Repository](https://github.com/mcity/mcity_data_engine)
27
+
28
  ## Model description
29
 
30
+ The `fisheye8k_facebook_deformable-detr-detic` model is a component of the **Mcity Data Engine**, an open-source system designed for iterative model improvement through open-vocabulary data selection. This engine provides modules for the complete data-based development cycle, from data acquisition to model deployment, specifically addressing the challenge of detecting long-tail and novel classes in large amounts of unlabeled data, particularly in Intelligent Transportation Systems (ITS).
31
+
32
+ This model leverages its base, `deformable-detr-detic`, to specialize in object detection within the context of the Mcity Data Engine's workflows, enabling the identification of objects even for classes not seen during its initial fine-tuning, through its open-vocabulary capabilities.
33
 
34
  ## Intended uses & limitations
35
 
36
+ This model is primarily intended for **zero-shot object detection** within Intelligent Transportation Systems (ITS) and related domains. It is designed to assist researchers and practitioners in identifying rare and novel classes of interest from raw visual data, facilitating the continuous improvement of AI models.
37
+
38
+ **Potential use cases include:**
39
+ * Detecting various types of road users and vehicles in complex traffic scenarios.
40
+ * Identifying long-tail or previously unseen objects in automotive perception datasets.
41
+ * Serving as a component within larger data curation and model training pipelines.
42
+
43
+ **Limitations:**
44
+ * While designed for generalization through its open-vocabulary approach, performance on highly out-of-distribution scenarios might still vary.
45
+ * Optimal utilization of the Mcity Data Engine workflows, including those leveraging this model, often requires a powerful GPU.
46
+ * The model's performance on standard perspective images may differ, as it was fine-tuned on fisheye camera data.
47
+
48
+ ## Sample Usage
49
+
50
+ You can use this model with the Hugging Face `transformers` library to perform zero-shot object detection. This example demonstrates how to detect objects using descriptive text queries.
51
+
52
+ ```python
53
+ from transformers import AutoProcessor, AutoModelForObjectDetection
54
+ import torch
55
+ from PIL import Image
56
+ import requests
57
+
58
+ # Load an example image. For best results, use images similar to the Fisheye8K dataset.
59
+ # This example uses a general image, but real-world usage should focus on ITS contexts.
60
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg" # Example image (cat, dog, couch)
61
+ image = Image.open(requests.get(url, stream=True).raw)
62
+
63
+ # Load processor and model from the Hugging Face Hub
64
+ model_id = "mcity-data-engine/fisheye8k_facebook_deformable-detr-detic"
65
+ processor = AutoProcessor.from_pretrained(model_id)
66
+ model = AutoModelForObjectDetection.from_pretrained(model_id)
67
+
68
+ # Define target labels/text queries. While trained on specific classes (Bus, Bike, Car, Pedestrian, Truck),
69
+ # the model can generalize to other categories thanks to its open-vocabulary nature.
70
+ text_queries = ["a bus on the road", "a bicycle", "a car", "a pedestrian crossing", "a truck"]
71
+
72
+ # Prepare inputs
73
+ inputs = processor(images=image, text=text_queries, return_tensors="pt")
74
+
75
+ # Perform inference
76
+ with torch.no_grad():
77
+ outputs = model(**inputs)
78
+
79
+ # Post-process outputs to get detected objects
80
+ target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
81
+ results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.5)[0]
82
+
83
+ # Print detected objects
84
+ print(f"Detected objects in the image (threshold=0.5):")
85
+ for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
86
+ print(
87
+ f" {text_queries[label.item()]}: {round(score.item(), 3)}
88
+ "
89
+ f" [xmin={round(box[0].item(), 2)}, ymin={round(box[1].item(), 2)}, "
90
+ f"xmax={round(box[2].item(), 2)}, ymax={round(box[3].item(), 2)}]"
91
+ )
92
+ ```
93
 
94
  ## Training and evaluation data
95
 
96
+ This model was fine-tuned on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). The Mcity Data Engine leverages data selection processes to focus on detecting long-tail and novel classes, which is crucial for ITS applications. The model's `config.json` indicates it was fine-tuned for classes such as "Bus", "Bike", "Car", "Pedestrian", and "Truck".
97
 
98
  ## Training procedure
99
 
 
112
  ### Training results
113
 
114
  | Training Loss | Epoch | Step | Validation Loss |
115
+ |:-------------:|:-----:|:-----:|:---------------:|\
116
+ | 2.435 | 1.0 | 5288 | 2.4832 |\
117
+ | 2.2626 | 2.0 | 10576 | 2.6324 |\
118
+ | 1.8443 | 3.0 | 15864 | 2.1361 |\
119
+ | 2.4834 | 4.0 | 21152 | 2.5269 |\
120
+ | 2.3417 | 5.0 | 26440 | 2.5997 |\
121
+ | 1.939 | 6.0 | 31728 | 2.1948 |\
122
+ | 1.8384 | 7.0 | 37016 | 2.0057 |\
123
+ | 1.7235 | 8.0 | 42304 | 2.0182 |\
124
+ | 1.728 | 9.0 | 47592 | 1.9454 |\
125
+ | 1.621 | 10.0 | 52880 | 1.9876 |\
126
+ | 1.539 | 11.0 | 58168 | 1.8862 |\
127
+ | 1.7229 | 12.0 | 63456 | 2.2071 |\
128
+ | 1.9613 | 13.0 | 68744 | 2.5147 |\
129
+ | 1.5238 | 14.0 | 74032 | 1.9836 |\
130
+ | 1.5777 | 15.0 | 79320 | 2.0812 |\
131
+ | 1.5963 | 16.0 | 84608 | 2.1348 |\
132
 
133
 
134
  ### Framework versions
 
138
  - Datasets 3.2.0
139
  - Tokenizers 0.21.0
140
 
141
+ ## Acknowledgements
142
+
143
+ Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
144
+
145
+ Special thanks to these amazing people for contributing to the Mcity Data Engine! 🙌
146
+
147
+ <a href="https://github.com/mcity/mcity_data_engine/graphs/contributors">
148
+ <img src="https://contrib.rocks/image?repo=mcity/mcity_data_engine" />
149
+ </a>
150
+
151
+ ## Citation
152
+
153
+ If you use the Mcity Data Engine in your research, feel free to cite the project:
154
+
155
+ ```bibtex
156
+ @article{bogdoll2025mcitydataengine,
157
+ title={Mcity Data Engine},
158
+ author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
159
+ journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
160
+ year={2025}
161
+ }
162
+ ```