🚒 AIS Maritime Anomaly Detection Models

A comprehensive suite of unsupervised machine learning models for detecting anomalous vessel behavior in AIS (Automatic Identification System) data, specifically designed for oil spill detection and maritime safety applications.

🎯 Model Overview

This repository contains 5 trained unsupervised anomaly detection models optimized for maritime AIS data analysis:

Model Type Accuracy Anomaly Rate Best For Size
IsolationForest 🌟 Ensemble 100% contamination match 10.0% Recommended - Best overall 1.1MB
LocalOutlierFactor Density-based 100% contamination match 10.0% Local anomaly detection 104MB
OneClassSVM SVM-based 100% contamination match 10.0% Non-linear patterns 3.9MB
EllipticEnvelope Statistical 100% contamination match 10.0% Gaussian distributed data 3.6MB
DBSCAN Clustering N/A (parameter-free) 2.1% Cluster-based anomalies 39MB

πŸ›’οΈ Oil Spill Detection Pipeline

These models are designed as Step 2A in a comprehensive oil spill detection system:

AIS Stream β†’ Anomaly Detection β†’ Sentinel-1 SAR Analysis β†’ Oil Spill Alert

Key Anomaly Patterns Detected:

  • Stationary vessels during potential oil transfer operations
  • Unusual speed patterns (too fast/slow for vessel type)
  • Deep-draught vessels in unexpected locations
  • Course/heading inconsistencies indicating suspicious navigation
  • Loitering behavior in sensitive maritime areas

πŸ“Š Performance Metrics

IsolationForest (Recommended Model):

  • Contamination Accuracy: 100.00%
  • Score Separation: 0.154 (excellent discrimination)
  • Silhouette Score: 0.415 (good clustering quality)
  • Statistical Significance: 6/6 features significant
  • Processing Speed: ~30 seconds for 358K records

Feature Discrimination Power:

  • Speed (SOG): 51.6% difference between normal/anomalous
  • Draught: 31.2% difference (deep vessels suspicious)
  • Width: 9.5% difference
  • Course Difference: High effect size (0.927)

πŸš€ Quick Start

Installation

pip install numpy pandas scikit-learn joblib

Usage Example

import joblib
import numpy as np

# Load the recommended model
model_data = joblib.load('isolationforest_model.joblib')
model = model_data['model']
scaler = model_data['scaler']

# Prepare your AIS data with these features:
# ['sog', 'cog', 'heading', 'width', 'length', 'draught',
#  'navigationalstatus_encoded', 'shiptype_encoded', 
#  'speed_category', 'size_category', 'course_diff', 'aspect_ratio']

# Make predictions
features_scaled = scaler.transform(your_ais_features)
anomaly_scores = model.decision_function(features_scaled)
predictions = model.predict(features_scaled)

# Anomalies have prediction = -1, anomaly_score < 0
anomalous_vessels = features_scaled[predictions == -1]

🌍 Environmental Impact

This system contributes to:

  • Marine pollution prevention through early oil spill detection
  • Maritime safety via suspicious vessel identification
  • Environmental protection of sensitive marine areas
  • Regulatory compliance for maritime authorities

πŸ“ Model Files

Available in both formats for maximum compatibility:

Pickle Format (.pkl) - Recommended for Hugging Face:

  • isolationforest_model.pkl (1.1MB) ⭐
  • localoutlierfactor_model.pkl (104MB)
  • oneclasssvm_model.pkl (3.9MB)
  • ellipticenvelope_model.pkl (3.6MB)
  • dbscan_model.pkl (39MB)

Joblib Format (.joblib) - For sklearn compatibility:

  • All models also available as .joblib files

Features

Data Processing

  • Handles missing values in AIS data
  • Encodes categorical variables (navigational status, ship type)
  • Creates derived features:
    • Speed categories (stationary, slow, normal, fast)
    • Vessel size categories
    • Course difference (COG vs Heading)
    • Aspect ratio (length/width)

Anomaly Detection

  • Uses Isolation Forest algorithm (unsupervised learning)
  • Configurable contamination parameter (default: 10% expected anomalies)
  • Provides anomaly scores and binary predictions

Analysis & Visualization

  • Statistical comparison between normal and anomalous vessels
  • Ship type analysis
  • Multiple visualization plots:
    • Anomaly score distributions
    • Speed vs vessel length scatter plots
    • Ship type anomaly rates
    • Course vs heading patterns
    • Vessel dimensions analysis

Installation

  1. Make sure you have Python 3.7+ installed
  2. Install required packages:
cd /Users/lakshmikotaru/Documents/ais_isolation_forest
pip install -r requirements.txt

Usage

Training the Model

Run the main script to train the Isolation Forest model:

python ais_anomaly_detection.py

This will:

  • Load the AIS data from /Users/lakshmikotaru/Downloads/ais_data.csv
  • Preprocess the data and create features
  • Train the Isolation Forest model
  • Generate analysis and visualizations
  • Save the trained model and results

Using the Trained Model

Use the prediction script to detect anomalies in new data:

# Use with new data file
python predict_anomalies.py path/to/new_ais_data.csv

# Or run without arguments to use original data as example
python predict_anomalies.py

Model Configuration

You can adjust the model parameters in ais_anomaly_detection.py:

detector = AISAnomalyDetector(
    contamination=0.1,    # Expected fraction of anomalies (10%)
    random_state=42       # For reproducible results
)

Data Format

The AIS data should be a CSV file with the following columns:

  • mmsi: Maritime Mobile Service Identity
  • navigationalstatus: Current navigation status
  • sog: Speed Over Ground (knots)
  • cog: Course Over Ground (degrees)
  • heading: Vessel heading (degrees)
  • shiptype: Type of vessel
  • width: Vessel width (meters)
  • length: Vessel length (meters)
  • draught: Vessel draught (meters)

Output Files

ais_isolation_forest_model.joblib

The trained model file that can be loaded for future predictions.

detected_anomalies.csv

Detailed information about all detected anomalies, including:

  • Original vessel data
  • Anomaly scores
  • Binary anomaly flags

anomaly_analysis_plots.png

Comprehensive visualization showing:

  • Anomaly score distributions
  • Feature comparisons between normal and anomalous vessels
  • Ship type analysis
  • Various scatter plots and distributions

Interpretation

Anomaly Scores

  • Lower (more negative) scores indicate higher anomaly likelihood
  • Scores are relative to the training data distribution

Common Anomaly Types

The model may detect:

  • Vessels with unusual speed patterns
  • Ships with inconsistent course/heading relationships
  • Vessels with atypical dimensions for their type
  • Unusual combinations of vessel characteristics

Example Output

============================================================
AIS DATA ANOMALY DETECTION USING ISOLATION FOREST
============================================================
Loading data from /Users/lakshmikotaru/Downloads/ais_data.csv...
Loaded 358351 records with 9 columns

Training Isolation Forest model...
Model training completed!
Number of anomalies detected: 35835 out of 358351 samples
Anomaly rate: 10.00%

ANOMALY ANALYSIS SUMMARY
============================================================
Normal samples: 322516 (90.0%)
Anomalous samples: 35835 (10.0%)

Customization

Adding New Features

To add new derived features, modify the preprocess_data method in the AISAnomalyDetector class.

Changing Model Parameters

Adjust the IsolationForest parameters in the __init__ method:

  • n_estimators: Number of trees in the forest
  • contamination: Expected proportion of anomalies
  • max_samples: Number of samples to draw for each tree

Visualization

Modify the visualize_results method to add new plots or change existing ones.

Notes

  • The model is unsupervised, so it learns patterns without labeled anomalies
  • Results should be validated by domain experts
  • The contamination parameter significantly affects the number of detected anomalies
  • Missing values are handled automatically during preprocessing

Troubleshooting

  1. Import errors: Make sure all requirements are installed
  2. File not found: Check that the AIS data file path is correct
  3. Memory issues: For very large datasets, consider processing in chunks
  4. Plotting issues: Ensure matplotlib backend is properly configured

Contact

Generated for AIS Anomaly Detection Project - 2025-01-11

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using MeghanaK25/ais-isolation-forest 1