fisheye8k_facebook_deformable-detr-detic
This model is a fine-tuned version of facebook/deformable-detr-detic on the Fisheye8K dataset, developed as part of the Mcity Data Engine project.
The model achieves the following results on the evaluation set:
- Loss: 2.1348
๐ Paper: Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection
๐ Project Page: Mcity Data Engine Docs
๐ป Code: mcity/mcity_data_engine GitHub Repository
Model description
The fisheye8k_facebook_deformable-detr-detic model is a component of the Mcity Data Engine, an open-source system designed for iterative model improvement through open-vocabulary data selection. This engine provides modules for the complete data-based development cycle, from data acquisition to model deployment, specifically addressing the challenge of detecting long-tail and novel classes in large amounts of unlabeled data, particularly in Intelligent Transportation Systems (ITS).
This model leverages its base, deformable-detr-detic, to specialize in object detection within the context of the Mcity Data Engine's workflows, enabling the identification of objects even for classes not seen during its initial fine-tuning, through its open-vocabulary capabilities.
Intended uses & limitations
This model is primarily intended for zero-shot object detection within Intelligent Transportation Systems (ITS) and related domains. It is designed to assist researchers and practitioners in identifying rare and novel classes of interest from raw visual data, facilitating the continuous improvement of AI models.
Potential use cases include:
- Detecting various types of road users and vehicles in complex traffic scenarios.
- Identifying long-tail or previously unseen objects in automotive perception datasets.
- Serving as a component within larger data curation and model training pipelines.
Limitations:
- While designed for generalization through its open-vocabulary approach, performance on highly out-of-distribution scenarios might still vary.
- Optimal utilization of the Mcity Data Engine workflows, including those leveraging this model, often requires a powerful GPU.
- The model's performance on standard perspective images may differ, as it was fine-tuned on fisheye camera data.
Sample Usage
You can use this model with the Hugging Face transformers library to perform zero-shot object detection. This example demonstrates how to detect objects using descriptive text queries.
from transformers import AutoProcessor, AutoModelForObjectDetection
import torch
from PIL import Image
import requests
# Load an example image. For best results, use images similar to the Fisheye8K dataset.
# This example uses a general image, but real-world usage should focus on ITS contexts.
url = "http://images.cocodataset.org/val2017/000000039769.jpg" # Example image (cat, dog, couch)
image = Image.open(requests.get(url, stream=True).raw)
# Load processor and model from the Hugging Face Hub
model_id = "mcity-data-engine/fisheye8k_facebook_deformable-detr-detic"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForObjectDetection.from_pretrained(model_id)
# Define target labels/text queries. While trained on specific classes (Bus, Bike, Car, Pedestrian, Truck),
# the model can generalize to other categories thanks to its open-vocabulary nature.
text_queries = ["a bus on the road", "a bicycle", "a car", "a pedestrian crossing", "a truck"]
# Prepare inputs
inputs = processor(images=image, text=text_queries, return_tensors="pt")
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process outputs to get detected objects
target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.5)[0]
# Print detected objects
print(f"Detected objects in the image (threshold=0.5):")
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
print(
f" {text_queries[label.item()]}: {round(score.item(), 3)}
"
f" [xmin={round(box[0].item(), 2)}, ymin={round(box[1].item(), 2)}, "
f"xmax={round(box[2].item(), 2)}, ymax={round(box[3].item(), 2)}]"
)
Training and evaluation data
This model was fine-tuned on the Fisheye8K dataset. The Mcity Data Engine leverages data selection processes to focus on detecting long-tail and novel classes, which is crucial for ITS applications. The model's config.json indicates it was fine-tuned for classes such as "Bus", "Bike", "Car", "Pedestrian", and "Truck".
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 36
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 2.435 | 1.0 | 5288 | 2.4832 |
| 2.2626 | 2.0 | 10576 | 2.6324 |
| 1.8443 | 3.0 | 15864 | 2.1361 |
| 2.4834 | 4.0 | 21152 | 2.5269 |
| 2.3417 | 5.0 | 26440 | 2.5997 |
| 1.939 | 6.0 | 31728 | 2.1948 |
| 1.8384 | 7.0 | 37016 | 2.0057 |
| 1.7235 | 8.0 | 42304 | 2.0182 |
| 1.728 | 9.0 | 47592 | 1.9454 |
| 1.621 | 10.0 | 52880 | 1.9876 |
| 1.539 | 11.0 | 58168 | 1.8862 |
| 1.7229 | 12.0 | 63456 | 2.2071 |
| 1.9613 | 13.0 | 68744 | 2.5147 |
| 1.5238 | 14.0 | 74032 | 1.9836 |
| 1.5777 | 15.0 | 79320 | 2.0812 |
| 1.5963 | 16.0 | 84608 | 2.1348 |\
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
Acknowledgements
Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldnโt have done it without their tremendous support!
Special thanks to these amazing people for contributing to the Mcity Data Engine! ๐
Citation
If you use the Mcity Data Engine in your research, feel free to cite the project:
@article{bogdoll2025mcitydataengine,
title={Mcity Data Engine},
author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
year={2025}
}
- Downloads last month
- 18
Model tree for mcity-data-engine/fisheye8k_facebook_deformable-detr-detic
Base model
facebook/deformable-detr-detic