Room Scene Classifier

DINOv2 기반 멀티헤드 호텔 이미지 장면 분류 모델입니다.

모델 개요

이 모델은 호텔 이미지를 Scene(장면), Concept(개념), Object(객체) 3가지 관점으로 동시에 분류하는 멀티헤드 딥러닝 모델입니다. DINOv2 백본을 사용하여 강력한 비전 특징을 추출하고, 각 헤드에서 특화된 분류를 수행합니다.

모델 정보

모델명: image_classifier_model_0.2
기반 모델: facebook/dinov2-large
이미지 크기: 224x224
채널: RGB (3채널)
총 파라미터: 303,252,502개 (백본 고정)
훈련 가능 파라미터: 24,598개

분류 헤드

Scene 헤드 (6개 클래스)

객실, 욕실, 수영장, 로비, 레스토랑, 기타

Concept 헤드 (3개 클래스)

실내, 야외, 클로즈업

Object 헤드 (13개 클래스)

침대, 소파, 샤워기, 욕조, 의자, 테이블, TV, 냉장고, 싱크대, 화장대, 거울, 기타, 미분류

사용법

Python으로 모델 사용

import torch
import onnxruntime as ort
import numpy as np
from PIL import Image
from torchvision import transforms
import json

# 모델 정보 로드
with open('image_classifier_model_0.2_model_info.json', 'r') as f:
    model_info = json.load(f)

# PyTorch 모델 로드
model = torch.load('image_classifier_model_0.2.pth', map_location='cpu')
model.eval()

# ONNX 모델 사용 (더 빠른 추론)
onnx_session = ort.InferenceSession('image_classifier_model_0.2.onnx')

# 이미지 전처리
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

def classify_image_pytorch(image_path):
    """PyTorch 모델을 사용한 이미지 분류"""
    image = transform(Image.open(image_path)).unsqueeze(0)
    
    with torch.no_grad():
        outputs = model(image)
        predictions = {}
        
        for head_name, logits in outputs.items():
            probabilities = torch.softmax(logits, dim=1)
            predicted_class = torch.argmax(probabilities, dim=1).item()
            confidence = probabilities[0, predicted_class].item()
            
            predictions[head_name] = {
                'class_id': predicted_class,
                'confidence': confidence,
                'probabilities': probabilities[0].tolist()
            }
    
    return predictions

def classify_image_onnx(image_path):
    """ONNX 모델을 사용한 이미지 분류 (권장)"""
    image = transform(Image.open(image_path)).numpy()
    
    # ONNX 모델 추론
    input_feed = {'input': image.astype(np.float32)}
    outputs = onnx_session.run(None, input_feed)
    
    predictions = {}
    head_names = ['scene', 'concept', 'object']
    
    for i, head_name in enumerate(head_names):
        logits = outputs[i]
        probabilities = torch.softmax(torch.tensor(logits), dim=1)
        predicted_class = torch.argmax(probabilities, dim=1).item()
        confidence = probabilities[0, predicted_class].item()
        
        predictions[head_name] = {
            'class_id': predicted_class,
            'confidence': confidence,
            'probabilities': probabilities[0].tolist()
        }
    
    return predictions

# 예시 사용
predictions = classify_image_onnx("hotel_room.jpg")
print("분류 결과:")
for head, result in predictions.items():
    print(f"{head}: 클래스 {result['class_id']}, 신뢰도 {result['confidence']:.4f}")

클래스 ID를 실제 클래스명으로 변환

def get_class_names(predictions, model_info):
    """클래스 ID를 실제 클래스명으로 변환"""
    class_mappings = model_info['class_mappings']
    
    results = {}
    for head, result in predictions.items():
        class_id = result['class_id']
        if head in class_mappings:
            actual_class_id = class_mappings[head][str(class_id)]
            results[head] = {
                'class_id': actual_class_id,
                'confidence': result['confidence']
            }
    
    return results

# 클래스명 변환 예시
class_names = get_class_names(predictions, model_info)
print("실제 클래스 ID:")
for head, result in class_names.items():
    print(f"{head}: {result['class_id']}")

배치 처리

def classify_batch_images(image_paths):
    """여러 이미지를 한 번에 분류"""
    results = []
    
    for image_path in image_paths:
        predictions = classify_image_onnx(image_path)
        results.append({
            'image_path': image_path,
            'predictions': predictions
        })
    
    return results

# 예시
image_paths = ["room1.jpg", "bathroom1.jpg", "lobby1.jpg"]
batch_results = classify_batch_images(image_paths)

for result in batch_results:
    print(f"\n이미지: {result['image_path']}")
    for head, pred in result['predictions'].items():
        print(f"  {head}: 클래스 {pred['class_id']}, 신뢰도 {pred['confidence']:.4f}")

모델 파일

image_classifier_model_0.2.pth: PyTorch 모델 파일
image_classifier_model_0.2.onnx: ONNX 모델 파일 (추론 최적화)
image_classifier_model_0.2_model_info.json: 모델 메타데이터
image_classifier_model_0.2_inference_example.py: 추론 예제 코드

모델 아키텍처

멀티헤드 분류 시스템

입력 이미지 (224×224)
    ↓
DINOv2 백본 (Frozen)
    ↓
공통 특징 (1024차원)
    ├─── Scene 헤드 → 6개 클래스
    ├─── Concept 헤드 → 3개 클래스
    └─── Object 헤드 → 13개 클래스

주요 특징

DINOv2 백본: 강력한 비전 트랜스포머 기반 특징 추출
백본 고정: 사전훈련된 특징을 활용하여 과적합 방지
멀티헤드: 3개 헤드로 다각도 분석
클래스 가중치: 불균형 데이터 자동 보정

전처리 요구사항

이미지 크기: 224x224 픽셀
색상 공간: RGB
정규화: ImageNet 표준값 사용 (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
크롭: 중앙 크롭 (center crop)
지원 형식: JPG, PNG, JPEG

사용 사례

직접 사용

호텔 이미지 자동 분류: 객실, 욕실, 로비 등 장면별 자동 분류
이미지 메타데이터 생성: 이미지의 장면, 개념, 객체 정보 자동 추출
이미지 데이터베이스 관리: 대량의 호텔 이미지 자동 태깅
품질 관리: 이미지 분류 일관성 검증

다운스트림 사용

호텔 관리 시스템: 객실 이미지 자동 분류 및 관리
여행 플랫폼: 객실 타입별 이미지 필터링
부동산 플랫폼: 숙소 시설 정보 자동 추출
이미지 검색 엔진: 다중 속성 기반 이미지 검색

제한사항

도메인 특화: 호텔/숙소 이미지에 특화되어 있어 다른 도메인에서는 성능이 제한적입니다.
이미지 품질: 저화질이나 노이즈가 많은 이미지에서는 성능이 저하될 수 있습니다.
각도 의존성: 특정 각도에서 촬영된 이미지에 대해 성능이 다를 수 있습니다.
클래스 불균형: 일부 클래스는 다른 클래스보다 성능이 낮을 수 있습니다.

라이선스

Apache 2.0 License

참고

이 모델은 Room Clusterer 프로젝트의 일부로 개발되었습니다. 더 자세한 정보는 프로젝트 저장소를 참조하세요.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ondame/image-classifier

Base model

facebook/dinov2-large

Quantized

(5)

this model