mcity-data-engine
/

fisheye8k_jozhang97_deta-swin-large-o365

@@ -1,35 +1,85 @@
 ---
-library_name: transformers
 base_model: jozhang97/deta-swin-large-o365
-tags:
-- generated_from_trainer
 datasets:
 - Voxel51/fisheye8k
 model-index:
 - name: fisheye8k_jozhang97_deta-swin-large-o365
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # fisheye8k_jozhang97_deta-swin-large-o365
-This model is a fine-tuned version of [jozhang97/deta-swin-large-o365](https://huggingface.co/jozhang97/deta-swin-large-o365) on the generator dataset.
-It achieves the following results on the evaluation set:
 - Loss: 1.0247
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -47,33 +97,30 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 1.3933        | 1.0   | 5288  | 1.6177          |
-| 1.098         | 2.0   | 10576 | 1.2979          |
-| 0.9565        | 3.0   | 15864 | 1.2650          |
-| 0.8734        | 4.0   | 21152 | 1.2495          |
-| 0.8196        | 5.0   | 26440 | 1.1328          |
-| 0.7977        | 6.0   | 31728 | 1.3190          |
-| 0.8448        | 7.0   | 37016 | 1.3999          |
-| 0.7399        | 8.0   | 42304 | 1.3117          |
-| 0.6325        | 9.0   | 47592 | 1.1202          |
-| 0.621         | 10.0  | 52880 | 1.1707          |
-| 0.7134        | 11.0  | 58168 | 1.2353          |
-| 0.6425        | 12.0  | 63456 | 1.0416          |
-| 0.5935        | 13.0  | 68744 | 0.9215          |
-| 0.5798        | 14.0  | 74032 | 1.0827          |
-| 0.5924        | 15.0  | 79320 | 1.0398          |
-| 0.5559        | 16.0  | 84608 | 1.0112          |
-| 0.5783        | 17.0  | 89896 | 1.0434          |
-| 0.5536        | 18.0  | 95184 | 1.0247          |
 ### Framework versions
 - Transformers 4.48.3
 - Pytorch 2.5.1+cu124
 - Datasets 3.2.0
-- Tokenizers 0.21.0
-Mcity Data Engine: https://arxiv.org/abs/2504.21614

 ---
 base_model: jozhang97/deta-swin-large-o365
 datasets:
 - Voxel51/fisheye8k
+library_name: transformers
+tags:
+- generated_from_trainer
+- deta
+- swin
+- traffic
+- automotive
+- ITS
+- computer-vision
+pipeline_tag: object-detection
+license: mit
 model-index:
 - name: fisheye8k_jozhang97_deta-swin-large-o365
   results: []
 ---
 # fisheye8k_jozhang97_deta-swin-large-o365
+This model is a fine-tuned version of [jozhang97/deta-swin-large-o365](https://huggingface.co/jozhang97/deta-swin-large-o365) on the [Voxel51/fisheye8k dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It achieves the following results on the evaluation set:
 - Loss: 1.0247
+This model is a component of the **Mcity Data Engine**, presented in the paper:
+*   **Paper:** [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
+*   **Project Documentation:** [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
+*   **GitHub Repository:** [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine)
 ## Model description
+The `fisheye8k_jozhang97_deta-swin-large-o365` model is part of the **Mcity Data Engine**, a comprehensive open-source system designed for iterative model improvement through an open-vocabulary data selection process. This model is based on the DETA architecture with a Swin-Large backbone, fine-tuned specifically for object detection on fisheye camera data, which is critical for Intelligent Transportation Systems (ITS).
+The Mcity Data Engine focuses on addressing the challenge of detecting long-tail and novel classes within large amounts of unlabeled data generated by vehicle fleets and roadside perception systems. This model leverages these advancements to provide robust object detection capabilities in challenging real-world ITS scenarios.
 ## Intended uses & limitations
+This model is intended for object detection tasks within Intelligent Transportation Systems (ITS), particularly for identifying vehicles (Bus, Bike, Car, Truck) and pedestrians from fisheye camera imagery. It is designed to facilitate the continuous improvement of AI models by enabling the detection and curation of rare and novel classes in large, unlabeled datasets.
+**Key Use Cases:**
+*   Object detection in automotive environments, especially with fisheye camera distortions.
+*   Integration into data pipelines for iterative model retraining and improvement.
+*   Supporting research and development in autonomous driving and transportation perception.
+**Limitations:**
+*   The model's performance is optimized for fisheye camera perspectives; performance on standard camera views may vary.
+*   It may exhibit reduced performance on out-of-distribution data or in environmental conditions not well-represented in its training data.
+*   As with any ML model, real-world deployment in safety-critical applications requires rigorous additional testing and validation.
+## Sample Usage
+You can use this model directly with the Hugging Face `transformers` library for object detection:
+```python
+from transformers import pipeline
+from PIL import Image
+import requests
+from io import BytesIO
+# Load the object detection pipeline
+detector = pipeline("object-detection", model="mcity-data-engine/fisheye8k_jozhang97_deta-swin-large-o365")
+# Example image from the Fisheye8K dataset
+img_url = "https://huggingface.co/datasets/Voxel51/fisheye8k/resolve/main/data/000000_1.png"
+response = requests.get(img_url)
+image = Image.open(BytesIO(response.content)).convert("RGB")
+# Perform inference
+detections = detector(image)
+# Print detected objects
+for detection in detections:
+    print(detection)
+# Example output structure:
+# [{'box': {'xmin': 18, 'ymin': 58, 'xmax': 227, 'ymax': 393}, 'score': 0.99, 'label': 'person'}]
+```
 ## Training and evaluation data
+This model was fine-tuned on the [Voxel51/fisheye8k dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). This dataset consists of images captured from fisheye cameras in automotive contexts, crucial for training models capable of handling wide-angle distortions and diverse traffic scenarios.
 ## Training procedure
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 1.3933 | 1.0 | 5288 | 1.6177 |
+| 1.098 | 2.0 | 10576 | 1.2979 |
+| 0.9565 | 3.0 | 15864 | 1.2650 |
+| 0.8734 | 4.0 | 21152 | 1.2495 |
+| 0.8196 | 5.0 | 26440 | 1.1328 |
+| 0.7977 | 6.0 | 31728 | 1.3190 |
+| 0.8448 | 7.0 | 37016 | 1.3999 |
+| 0.7399 | 8.0 | 42304 | 1.3117 |
+| 0.6325 | 9.0 | 47592 | 1.1202 |
+| 0.621 | 10.0 | 52880 | 1.1707 |
+| 0.7134 | 11.0 | 58168 | 1.2353 |
+| 0.6425 | 12.0 | 63456 | 1.0416 |
+| 0.5935 | 13.0 | 68744 | 0.9215 |
+| 0.5798 | 14.0 | 74032 | 1.0827 |
+| 0.5924 | 15.0 | 79320 | 1.0398 |
+| 0.5559 | 16.0 | 84608 | 1.0112 |
+| 0.5783 | 17.0 | 89896 | 1.0434 |
+| 0.5536 | 18.0 | 95184 | 1.0247 |
 ### Framework versions
 - Transformers 4.48.3
 - Pytorch 2.5.1+cu124
 - Datasets 3.2.0
+- Tokenizers 0.21.0