nielsr HF Staff commited on
Commit
08db289
·
verified ·
1 Parent(s): dba92d1

Improve model card: Add pipeline tag, links, description, and usage

Browse files

This PR significantly improves the model card for `fisheye8k_SenseTime_deformable-detr` by:

- Adding the `pipeline_tag: object-detection` to the metadata, which enhances discoverability on the Hub.
- Connecting the model to its foundational paper, [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614).
- Adding direct links to the project homepage and the GitHub repository for easier access to more information and code.
- Providing a clear sample usage code snippet using the `transformers` library.
- Expanding the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with details extracted from the paper abstract and GitHub repository information.

This makes the model card much more informative and user-friendly.

Files changed (1) hide show
  1. README.md +70 -13
README.md CHANGED
@@ -1,36 +1,95 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  license: apache-2.0
4
- base_model: SenseTime/deformable-detr
5
  tags:
6
  - generated_from_trainer
7
- datasets:
8
- - Voxel51/fisheye8k
 
 
9
  model-index:
10
  - name: fisheye8k_SenseTime_deformable-detr
11
  results: []
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # fisheye8k_SenseTime_deformable-detr
18
 
19
- This model is a fine-tuned version of [SenseTime/deformable-detr](https://huggingface.co/SenseTime/deformable-detr) on the generator dataset.
 
 
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 1.2335
22
 
23
  ## Model description
24
 
25
- More information needed
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
@@ -66,6 +125,4 @@ The following hyperparameters were used during training:
66
  - Transformers 4.48.3
67
  - Pytorch 2.5.1+cu124
68
  - Datasets 3.2.0
69
- - Tokenizers 0.21.0
70
-
71
- Mcity Data Engine: https://arxiv.org/abs/2504.21614
 
1
  ---
2
+ base_model: SenseTime/deformable-detr
3
+ datasets:
4
+ - Voxel51/fisheye8k
5
  library_name: transformers
6
  license: apache-2.0
 
7
  tags:
8
  - generated_from_trainer
9
+ - object-detection
10
+ - computer-vision
11
+ - deformable-detr
12
+ - detr
13
  model-index:
14
  - name: fisheye8k_SenseTime_deformable-detr
15
  results: []
16
+ pipeline_tag: object-detection
17
  ---
18
 
 
 
 
19
  # fisheye8k_SenseTime_deformable-detr
20
 
21
+ This model is a fine-tuned version of [SenseTime/deformable-detr](https://huggingface.co/SenseTime/deformable-detr) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It was developed as part of the [Mcity Data Engine](https://mcity.github.io/mcity_data_engine/) project, described in the paper [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614).
22
+
23
+ The code for the Mcity Data Engine project is available on [GitHub](https://github.com/mcity/mcity_data_engine).
24
+
25
  It achieves the following results on the evaluation set:
26
  - Loss: 1.2335
27
 
28
  ## Model description
29
 
30
+ This model is a fine-tuned object detection model based on the `SenseTime/deformable-detr` architecture, specifically trained for object detection on fisheye camera imagery. It is a product of the [Mcity Data Engine](https://mcity.github.io/mcity_data_engine/), an open-source system designed for iterative data selection and model improvement in Intelligent Transportation Systems (ITS). The model can detect objects such as "Bus", "Bike", "Car", "Pedestrian", and "Truck", leveraging an open-vocabulary data selection process during its development to focus on rare and novel classes.
31
 
32
  ## Intended uses & limitations
33
 
34
+ This model is intended for object detection tasks within Intelligent Transportation Systems (ITS) that utilize fisheye camera data. Potential applications include traffic monitoring, enhancing autonomous driving perception, and smart city infrastructure, with a focus on detecting long-tail classes of interest and vulnerable road users (VRU).
35
+
36
+ **Limitations:**
37
+ * The model's performance is optimized for fisheye camera data and the specific object classes it was trained on.
38
+ * Performance may vary significantly in out-of-distribution scenarios or when applied to data from different camera types or environments.
39
+ * Users should consider potential biases inherited from the underlying Fisheye8K dataset.
40
+
41
+ ## Sample Usage
42
+
43
+ You can use this model directly with the `transformers` pipeline for object detection:
44
+
45
+ ```python
46
+ from transformers import pipeline
47
+ from PIL import Image
48
+ import requests
49
+ from io import BytesIO
50
+
51
+ # Load the object detection pipeline
52
+ detector = pipeline("object-detection", model="mcity-data-engine/fisheye8k_SenseTime_deformable-detr")
53
+
54
+ # Example image (replace with a relevant fisheye image if available, or a local path)
55
+ # Using a generic example image for demonstration purposes. For best results, use a fisheye image.
56
+ image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bird_sized.jpg"
57
+ try:
58
+ response = requests.get(image_url, stream=True)
59
+ response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
60
+ image = Image.open(BytesIO(response.content)).convert("RGB")
61
+ except requests.exceptions.RequestException as e:
62
+ print(f"Could not load example image from URL: {e}. Please provide a local image path.")
63
+ # Fallback/exit if image cannot be loaded
64
+ exit()
65
+
66
+ # Perform inference
67
+ predictions = detector(image)
68
+
69
+ # Print detected objects
70
+ for pred in predictions:
71
+ print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}")
72
+
73
+ # For visualization (optional, requires matplotlib):
74
+ # from matplotlib import pyplot as plt
75
+ # import matplotlib.patches as patches
76
+ #
77
+ # fig, ax = plt.subplots(1)
78
+ # ax.imshow(image)
79
+ #
80
+ # for p in predictions:
81
+ # box = p['box']
82
+ # rect = patches.Rectangle((box['xmin'], box['ymin']), box['xmax'] - box['xmin'], box['ymax'] - box['ymin'],
83
+ # linewidth=1, edgecolor='r', facecolor='none')
84
+ # ax.add_patch(rect)
85
+ # plt.text(box['xmin'], box['ymin'] - 5, f"{p['label']}: {p['score']:.2f}", color='red', fontsize=8)
86
+ #
87
+ # plt.show()
88
+ ```
89
 
90
  ## Training and evaluation data
91
 
92
+ This model was fine-tuned on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). The Fisheye8K dataset comprises images captured from fisheye cameras, featuring annotated instances of common road users such as cars, buses, bikes, trucks, and pedestrians. The training process leveraged the capabilities of the [Mcity Data Engine](https://mcity.github.io/mcity_data_engine/), which facilitates iterative model improvement and open-vocabulary data selection, especially for Intelligent Transportation Systems (ITS) applications.
93
 
94
  ## Training procedure
95
 
 
125
  - Transformers 4.48.3
126
  - Pytorch 2.5.1+cu124
127
  - Datasets 3.2.0
128
+ - Tokenizers 0.21.0