Improve model card: Add pipeline tag, links, description, and usage
Browse filesThis PR significantly improves the model card for `fisheye8k_SenseTime_deformable-detr` by:
- Adding the `pipeline_tag: object-detection` to the metadata, which enhances discoverability on the Hub.
- Connecting the model to its foundational paper, [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614).
- Adding direct links to the project homepage and the GitHub repository for easier access to more information and code.
- Providing a clear sample usage code snippet using the `transformers` library.
- Expanding the "Model description", "Intended uses & limitations", and "Training and evaluation data" sections with details extracted from the paper abstract and GitHub repository information.
This makes the model card much more informative and user-friendly.
|
@@ -1,36 +1,95 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
-
base_model: SenseTime/deformable-detr
|
| 5 |
tags:
|
| 6 |
- generated_from_trainer
|
| 7 |
-
|
| 8 |
-
-
|
|
|
|
|
|
|
| 9 |
model-index:
|
| 10 |
- name: fisheye8k_SenseTime_deformable-detr
|
| 11 |
results: []
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 15 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 16 |
-
|
| 17 |
# fisheye8k_SenseTime_deformable-detr
|
| 18 |
|
| 19 |
-
This model is a fine-tuned version of [SenseTime/deformable-detr](https://huggingface.co/SenseTime/deformable-detr) on the
|
|
|
|
|
|
|
|
|
|
| 20 |
It achieves the following results on the evaluation set:
|
| 21 |
- Loss: 1.2335
|
| 22 |
|
| 23 |
## Model description
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
## Intended uses & limitations
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Training and evaluation data
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
## Training procedure
|
| 36 |
|
|
@@ -66,6 +125,4 @@ The following hyperparameters were used during training:
|
|
| 66 |
- Transformers 4.48.3
|
| 67 |
- Pytorch 2.5.1+cu124
|
| 68 |
- Datasets 3.2.0
|
| 69 |
-
- Tokenizers 0.21.0
|
| 70 |
-
|
| 71 |
-
Mcity Data Engine: https://arxiv.org/abs/2504.21614
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: SenseTime/deformable-detr
|
| 3 |
+
datasets:
|
| 4 |
+
- Voxel51/fisheye8k
|
| 5 |
library_name: transformers
|
| 6 |
license: apache-2.0
|
|
|
|
| 7 |
tags:
|
| 8 |
- generated_from_trainer
|
| 9 |
+
- object-detection
|
| 10 |
+
- computer-vision
|
| 11 |
+
- deformable-detr
|
| 12 |
+
- detr
|
| 13 |
model-index:
|
| 14 |
- name: fisheye8k_SenseTime_deformable-detr
|
| 15 |
results: []
|
| 16 |
+
pipeline_tag: object-detection
|
| 17 |
---
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
# fisheye8k_SenseTime_deformable-detr
|
| 20 |
|
| 21 |
+
This model is a fine-tuned version of [SenseTime/deformable-detr](https://huggingface.co/SenseTime/deformable-detr) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It was developed as part of the [Mcity Data Engine](https://mcity.github.io/mcity_data_engine/) project, described in the paper [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614).
|
| 22 |
+
|
| 23 |
+
The code for the Mcity Data Engine project is available on [GitHub](https://github.com/mcity/mcity_data_engine).
|
| 24 |
+
|
| 25 |
It achieves the following results on the evaluation set:
|
| 26 |
- Loss: 1.2335
|
| 27 |
|
| 28 |
## Model description
|
| 29 |
|
| 30 |
+
This model is a fine-tuned object detection model based on the `SenseTime/deformable-detr` architecture, specifically trained for object detection on fisheye camera imagery. It is a product of the [Mcity Data Engine](https://mcity.github.io/mcity_data_engine/), an open-source system designed for iterative data selection and model improvement in Intelligent Transportation Systems (ITS). The model can detect objects such as "Bus", "Bike", "Car", "Pedestrian", and "Truck", leveraging an open-vocabulary data selection process during its development to focus on rare and novel classes.
|
| 31 |
|
| 32 |
## Intended uses & limitations
|
| 33 |
|
| 34 |
+
This model is intended for object detection tasks within Intelligent Transportation Systems (ITS) that utilize fisheye camera data. Potential applications include traffic monitoring, enhancing autonomous driving perception, and smart city infrastructure, with a focus on detecting long-tail classes of interest and vulnerable road users (VRU).
|
| 35 |
+
|
| 36 |
+
**Limitations:**
|
| 37 |
+
* The model's performance is optimized for fisheye camera data and the specific object classes it was trained on.
|
| 38 |
+
* Performance may vary significantly in out-of-distribution scenarios or when applied to data from different camera types or environments.
|
| 39 |
+
* Users should consider potential biases inherited from the underlying Fisheye8K dataset.
|
| 40 |
+
|
| 41 |
+
## Sample Usage
|
| 42 |
+
|
| 43 |
+
You can use this model directly with the `transformers` pipeline for object detection:
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
from transformers import pipeline
|
| 47 |
+
from PIL import Image
|
| 48 |
+
import requests
|
| 49 |
+
from io import BytesIO
|
| 50 |
+
|
| 51 |
+
# Load the object detection pipeline
|
| 52 |
+
detector = pipeline("object-detection", model="mcity-data-engine/fisheye8k_SenseTime_deformable-detr")
|
| 53 |
+
|
| 54 |
+
# Example image (replace with a relevant fisheye image if available, or a local path)
|
| 55 |
+
# Using a generic example image for demonstration purposes. For best results, use a fisheye image.
|
| 56 |
+
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bird_sized.jpg"
|
| 57 |
+
try:
|
| 58 |
+
response = requests.get(image_url, stream=True)
|
| 59 |
+
response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
|
| 60 |
+
image = Image.open(BytesIO(response.content)).convert("RGB")
|
| 61 |
+
except requests.exceptions.RequestException as e:
|
| 62 |
+
print(f"Could not load example image from URL: {e}. Please provide a local image path.")
|
| 63 |
+
# Fallback/exit if image cannot be loaded
|
| 64 |
+
exit()
|
| 65 |
+
|
| 66 |
+
# Perform inference
|
| 67 |
+
predictions = detector(image)
|
| 68 |
+
|
| 69 |
+
# Print detected objects
|
| 70 |
+
for pred in predictions:
|
| 71 |
+
print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}")
|
| 72 |
+
|
| 73 |
+
# For visualization (optional, requires matplotlib):
|
| 74 |
+
# from matplotlib import pyplot as plt
|
| 75 |
+
# import matplotlib.patches as patches
|
| 76 |
+
#
|
| 77 |
+
# fig, ax = plt.subplots(1)
|
| 78 |
+
# ax.imshow(image)
|
| 79 |
+
#
|
| 80 |
+
# for p in predictions:
|
| 81 |
+
# box = p['box']
|
| 82 |
+
# rect = patches.Rectangle((box['xmin'], box['ymin']), box['xmax'] - box['xmin'], box['ymax'] - box['ymin'],
|
| 83 |
+
# linewidth=1, edgecolor='r', facecolor='none')
|
| 84 |
+
# ax.add_patch(rect)
|
| 85 |
+
# plt.text(box['xmin'], box['ymin'] - 5, f"{p['label']}: {p['score']:.2f}", color='red', fontsize=8)
|
| 86 |
+
#
|
| 87 |
+
# plt.show()
|
| 88 |
+
```
|
| 89 |
|
| 90 |
## Training and evaluation data
|
| 91 |
|
| 92 |
+
This model was fine-tuned on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). The Fisheye8K dataset comprises images captured from fisheye cameras, featuring annotated instances of common road users such as cars, buses, bikes, trucks, and pedestrians. The training process leveraged the capabilities of the [Mcity Data Engine](https://mcity.github.io/mcity_data_engine/), which facilitates iterative model improvement and open-vocabulary data selection, especially for Intelligent Transportation Systems (ITS) applications.
|
| 93 |
|
| 94 |
## Training procedure
|
| 95 |
|
|
|
|
| 125 |
- Transformers 4.48.3
|
| 126 |
- Pytorch 2.5.1+cu124
|
| 127 |
- Datasets 3.2.0
|
| 128 |
+
- Tokenizers 0.21.0
|
|
|
|
|
|