Improve model card: Add pipeline tag, license, and comprehensive details

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +82 -35
README.md CHANGED
@@ -1,35 +1,85 @@
1
  ---
2
- library_name: transformers
3
  base_model: jozhang97/deta-swin-large-o365
4
- tags:
5
- - generated_from_trainer
6
  datasets:
7
  - Voxel51/fisheye8k
 
 
 
 
 
 
 
 
 
 
 
8
  model-index:
9
  - name: fisheye8k_jozhang97_deta-swin-large-o365
10
  results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
  # fisheye8k_jozhang97_deta-swin-large-o365
17
 
18
- This model is a fine-tuned version of [jozhang97/deta-swin-large-o365](https://huggingface.co/jozhang97/deta-swin-large-o365) on the generator dataset.
19
- It achieves the following results on the evaluation set:
20
  - Loss: 1.0247
21
 
 
 
 
 
 
22
  ## Model description
23
 
24
- More information needed
 
 
25
 
26
  ## Intended uses & limitations
27
 
28
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Training and evaluation data
31
 
32
- More information needed
33
 
34
  ## Training procedure
35
 
@@ -47,33 +97,30 @@ The following hyperparameters were used during training:
47
 
48
  ### Training results
49
 
50
- | Training Loss | Epoch | Step | Validation Loss |
51
- |:-------------:|:-----:|:-----:|:---------------:|
52
- | 1.3933 | 1.0 | 5288 | 1.6177 |
53
- | 1.098 | 2.0 | 10576 | 1.2979 |
54
- | 0.9565 | 3.0 | 15864 | 1.2650 |
55
- | 0.8734 | 4.0 | 21152 | 1.2495 |
56
- | 0.8196 | 5.0 | 26440 | 1.1328 |
57
- | 0.7977 | 6.0 | 31728 | 1.3190 |
58
- | 0.8448 | 7.0 | 37016 | 1.3999 |
59
- | 0.7399 | 8.0 | 42304 | 1.3117 |
60
- | 0.6325 | 9.0 | 47592 | 1.1202 |
61
- | 0.621 | 10.0 | 52880 | 1.1707 |
62
- | 0.7134 | 11.0 | 58168 | 1.2353 |
63
- | 0.6425 | 12.0 | 63456 | 1.0416 |
64
- | 0.5935 | 13.0 | 68744 | 0.9215 |
65
- | 0.5798 | 14.0 | 74032 | 1.0827 |
66
- | 0.5924 | 15.0 | 79320 | 1.0398 |
67
- | 0.5559 | 16.0 | 84608 | 1.0112 |
68
- | 0.5783 | 17.0 | 89896 | 1.0434 |
69
- | 0.5536 | 18.0 | 95184 | 1.0247 |
70
-
71
 
72
  ### Framework versions
73
 
74
  - Transformers 4.48.3
75
  - Pytorch 2.5.1+cu124
76
  - Datasets 3.2.0
77
- - Tokenizers 0.21.0
78
-
79
- Mcity Data Engine: https://arxiv.org/abs/2504.21614
 
1
  ---
 
2
  base_model: jozhang97/deta-swin-large-o365
 
 
3
  datasets:
4
  - Voxel51/fisheye8k
5
+ library_name: transformers
6
+ tags:
7
+ - generated_from_trainer
8
+ - deta
9
+ - swin
10
+ - traffic
11
+ - automotive
12
+ - ITS
13
+ - computer-vision
14
+ pipeline_tag: object-detection
15
+ license: mit
16
  model-index:
17
  - name: fisheye8k_jozhang97_deta-swin-large-o365
18
  results: []
19
  ---
20
 
 
 
 
21
  # fisheye8k_jozhang97_deta-swin-large-o365
22
 
23
+ This model is a fine-tuned version of [jozhang97/deta-swin-large-o365](https://huggingface.co/jozhang97/deta-swin-large-o365) on the [Voxel51/fisheye8k dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It achieves the following results on the evaluation set:
 
24
  - Loss: 1.0247
25
 
26
+ This model is a component of the **Mcity Data Engine**, presented in the paper:
27
+ * **Paper:** [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
28
+ * **Project Documentation:** [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
29
+ * **GitHub Repository:** [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine)
30
+
31
  ## Model description
32
 
33
+ The `fisheye8k_jozhang97_deta-swin-large-o365` model is part of the **Mcity Data Engine**, a comprehensive open-source system designed for iterative model improvement through an open-vocabulary data selection process. This model is based on the DETA architecture with a Swin-Large backbone, fine-tuned specifically for object detection on fisheye camera data, which is critical for Intelligent Transportation Systems (ITS).
34
+
35
+ The Mcity Data Engine focuses on addressing the challenge of detecting long-tail and novel classes within large amounts of unlabeled data generated by vehicle fleets and roadside perception systems. This model leverages these advancements to provide robust object detection capabilities in challenging real-world ITS scenarios.
36
 
37
  ## Intended uses & limitations
38
 
39
+ This model is intended for object detection tasks within Intelligent Transportation Systems (ITS), particularly for identifying vehicles (Bus, Bike, Car, Truck) and pedestrians from fisheye camera imagery. It is designed to facilitate the continuous improvement of AI models by enabling the detection and curation of rare and novel classes in large, unlabeled datasets.
40
+
41
+ **Key Use Cases:**
42
+ * Object detection in automotive environments, especially with fisheye camera distortions.
43
+ * Integration into data pipelines for iterative model retraining and improvement.
44
+ * Supporting research and development in autonomous driving and transportation perception.
45
+
46
+ **Limitations:**
47
+ * The model's performance is optimized for fisheye camera perspectives; performance on standard camera views may vary.
48
+ * It may exhibit reduced performance on out-of-distribution data or in environmental conditions not well-represented in its training data.
49
+ * As with any ML model, real-world deployment in safety-critical applications requires rigorous additional testing and validation.
50
+
51
+ ## Sample Usage
52
+
53
+ You can use this model directly with the Hugging Face `transformers` library for object detection:
54
+
55
+ ```python
56
+ from transformers import pipeline
57
+ from PIL import Image
58
+ import requests
59
+ from io import BytesIO
60
+
61
+ # Load the object detection pipeline
62
+ detector = pipeline("object-detection", model="mcity-data-engine/fisheye8k_jozhang97_deta-swin-large-o365")
63
+
64
+ # Example image from the Fisheye8K dataset
65
+ img_url = "https://huggingface.co/datasets/Voxel51/fisheye8k/resolve/main/data/000000_1.png"
66
+ response = requests.get(img_url)
67
+ image = Image.open(BytesIO(response.content)).convert("RGB")
68
+
69
+ # Perform inference
70
+ detections = detector(image)
71
+
72
+ # Print detected objects
73
+ for detection in detections:
74
+ print(detection)
75
+
76
+ # Example output structure:
77
+ # [{'box': {'xmin': 18, 'ymin': 58, 'xmax': 227, 'ymax': 393}, 'score': 0.99, 'label': 'person'}]
78
+ ```
79
 
80
  ## Training and evaluation data
81
 
82
+ This model was fine-tuned on the [Voxel51/fisheye8k dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). This dataset consists of images captured from fisheye cameras in automotive contexts, crucial for training models capable of handling wide-angle distortions and diverse traffic scenarios.
83
 
84
  ## Training procedure
85
 
 
97
 
98
  ### Training results
99
 
100
+ | Training Loss | Epoch | Step | Validation Loss |
101
+ |:-------------:|:-----:|:----:|:---------------:|
102
+ | 1.3933 | 1.0 | 5288 | 1.6177 |
103
+ | 1.098 | 2.0 | 10576 | 1.2979 |
104
+ | 0.9565 | 3.0 | 15864 | 1.2650 |
105
+ | 0.8734 | 4.0 | 21152 | 1.2495 |
106
+ | 0.8196 | 5.0 | 26440 | 1.1328 |
107
+ | 0.7977 | 6.0 | 31728 | 1.3190 |
108
+ | 0.8448 | 7.0 | 37016 | 1.3999 |
109
+ | 0.7399 | 8.0 | 42304 | 1.3117 |
110
+ | 0.6325 | 9.0 | 47592 | 1.1202 |
111
+ | 0.621 | 10.0 | 52880 | 1.1707 |
112
+ | 0.7134 | 11.0 | 58168 | 1.2353 |
113
+ | 0.6425 | 12.0 | 63456 | 1.0416 |
114
+ | 0.5935 | 13.0 | 68744 | 0.9215 |
115
+ | 0.5798 | 14.0 | 74032 | 1.0827 |
116
+ | 0.5924 | 15.0 | 79320 | 1.0398 |
117
+ | 0.5559 | 16.0 | 84608 | 1.0112 |
118
+ | 0.5783 | 17.0 | 89896 | 1.0434 |
119
+ | 0.5536 | 18.0 | 95184 | 1.0247 |
 
120
 
121
  ### Framework versions
122
 
123
  - Transformers 4.48.3
124
  - Pytorch 2.5.1+cu124
125
  - Datasets 3.2.0
126
+ - Tokenizers 0.21.0