π’ AIS Maritime Anomaly Detection Models
A comprehensive suite of unsupervised machine learning models for detecting anomalous vessel behavior in AIS (Automatic Identification System) data, specifically designed for oil spill detection and maritime safety applications.
π― Model Overview
This repository contains 5 trained unsupervised anomaly detection models optimized for maritime AIS data analysis:
| Model | Type | Accuracy | Anomaly Rate | Best For | Size |
|---|---|---|---|---|---|
| IsolationForest π | Ensemble | 100% contamination match | 10.0% | Recommended - Best overall | 1.1MB |
| LocalOutlierFactor | Density-based | 100% contamination match | 10.0% | Local anomaly detection | 104MB |
| OneClassSVM | SVM-based | 100% contamination match | 10.0% | Non-linear patterns | 3.9MB |
| EllipticEnvelope | Statistical | 100% contamination match | 10.0% | Gaussian distributed data | 3.6MB |
| DBSCAN | Clustering | N/A (parameter-free) | 2.1% | Cluster-based anomalies | 39MB |
π’οΈ Oil Spill Detection Pipeline
These models are designed as Step 2A in a comprehensive oil spill detection system:
AIS Stream β Anomaly Detection β Sentinel-1 SAR Analysis β Oil Spill Alert
Key Anomaly Patterns Detected:
- Stationary vessels during potential oil transfer operations
- Unusual speed patterns (too fast/slow for vessel type)
- Deep-draught vessels in unexpected locations
- Course/heading inconsistencies indicating suspicious navigation
- Loitering behavior in sensitive maritime areas
π Performance Metrics
IsolationForest (Recommended Model):
- Contamination Accuracy: 100.00%
- Score Separation: 0.154 (excellent discrimination)
- Silhouette Score: 0.415 (good clustering quality)
- Statistical Significance: 6/6 features significant
- Processing Speed: ~30 seconds for 358K records
Feature Discrimination Power:
- Speed (SOG): 51.6% difference between normal/anomalous
- Draught: 31.2% difference (deep vessels suspicious)
- Width: 9.5% difference
- Course Difference: High effect size (0.927)
π Quick Start
Installation
pip install numpy pandas scikit-learn joblib
Usage Example
import joblib
import numpy as np
# Load the recommended model
model_data = joblib.load('isolationforest_model.joblib')
model = model_data['model']
scaler = model_data['scaler']
# Prepare your AIS data with these features:
# ['sog', 'cog', 'heading', 'width', 'length', 'draught',
# 'navigationalstatus_encoded', 'shiptype_encoded',
# 'speed_category', 'size_category', 'course_diff', 'aspect_ratio']
# Make predictions
features_scaled = scaler.transform(your_ais_features)
anomaly_scores = model.decision_function(features_scaled)
predictions = model.predict(features_scaled)
# Anomalies have prediction = -1, anomaly_score < 0
anomalous_vessels = features_scaled[predictions == -1]
π Environmental Impact
This system contributes to:
- Marine pollution prevention through early oil spill detection
- Maritime safety via suspicious vessel identification
- Environmental protection of sensitive marine areas
- Regulatory compliance for maritime authorities
π Model Files
Available in both formats for maximum compatibility:
Pickle Format (.pkl) - Recommended for Hugging Face:
isolationforest_model.pkl(1.1MB) βlocaloutlierfactor_model.pkl(104MB)oneclasssvm_model.pkl(3.9MB)ellipticenvelope_model.pkl(3.6MB)dbscan_model.pkl(39MB)
Joblib Format (.joblib) - For sklearn compatibility:
- All models also available as .joblib files
Features
Data Processing
- Handles missing values in AIS data
- Encodes categorical variables (navigational status, ship type)
- Creates derived features:
- Speed categories (stationary, slow, normal, fast)
- Vessel size categories
- Course difference (COG vs Heading)
- Aspect ratio (length/width)
Anomaly Detection
- Uses Isolation Forest algorithm (unsupervised learning)
- Configurable contamination parameter (default: 10% expected anomalies)
- Provides anomaly scores and binary predictions
Analysis & Visualization
- Statistical comparison between normal and anomalous vessels
- Ship type analysis
- Multiple visualization plots:
- Anomaly score distributions
- Speed vs vessel length scatter plots
- Ship type anomaly rates
- Course vs heading patterns
- Vessel dimensions analysis
Installation
- Make sure you have Python 3.7+ installed
- Install required packages:
cd /Users/lakshmikotaru/Documents/ais_isolation_forest
pip install -r requirements.txt
Usage
Training the Model
Run the main script to train the Isolation Forest model:
python ais_anomaly_detection.py
This will:
- Load the AIS data from
/Users/lakshmikotaru/Downloads/ais_data.csv - Preprocess the data and create features
- Train the Isolation Forest model
- Generate analysis and visualizations
- Save the trained model and results
Using the Trained Model
Use the prediction script to detect anomalies in new data:
# Use with new data file
python predict_anomalies.py path/to/new_ais_data.csv
# Or run without arguments to use original data as example
python predict_anomalies.py
Model Configuration
You can adjust the model parameters in ais_anomaly_detection.py:
detector = AISAnomalyDetector(
contamination=0.1, # Expected fraction of anomalies (10%)
random_state=42 # For reproducible results
)
Data Format
The AIS data should be a CSV file with the following columns:
mmsi: Maritime Mobile Service Identitynavigationalstatus: Current navigation statussog: Speed Over Ground (knots)cog: Course Over Ground (degrees)heading: Vessel heading (degrees)shiptype: Type of vesselwidth: Vessel width (meters)length: Vessel length (meters)draught: Vessel draught (meters)
Output Files
ais_isolation_forest_model.joblib
The trained model file that can be loaded for future predictions.
detected_anomalies.csv
Detailed information about all detected anomalies, including:
- Original vessel data
- Anomaly scores
- Binary anomaly flags
anomaly_analysis_plots.png
Comprehensive visualization showing:
- Anomaly score distributions
- Feature comparisons between normal and anomalous vessels
- Ship type analysis
- Various scatter plots and distributions
Interpretation
Anomaly Scores
- Lower (more negative) scores indicate higher anomaly likelihood
- Scores are relative to the training data distribution
Common Anomaly Types
The model may detect:
- Vessels with unusual speed patterns
- Ships with inconsistent course/heading relationships
- Vessels with atypical dimensions for their type
- Unusual combinations of vessel characteristics
Example Output
============================================================
AIS DATA ANOMALY DETECTION USING ISOLATION FOREST
============================================================
Loading data from /Users/lakshmikotaru/Downloads/ais_data.csv...
Loaded 358351 records with 9 columns
Training Isolation Forest model...
Model training completed!
Number of anomalies detected: 35835 out of 358351 samples
Anomaly rate: 10.00%
ANOMALY ANALYSIS SUMMARY
============================================================
Normal samples: 322516 (90.0%)
Anomalous samples: 35835 (10.0%)
Customization
Adding New Features
To add new derived features, modify the preprocess_data method in the AISAnomalyDetector class.
Changing Model Parameters
Adjust the IsolationForest parameters in the __init__ method:
n_estimators: Number of trees in the forestcontamination: Expected proportion of anomaliesmax_samples: Number of samples to draw for each tree
Visualization
Modify the visualize_results method to add new plots or change existing ones.
Notes
- The model is unsupervised, so it learns patterns without labeled anomalies
- Results should be validated by domain experts
- The contamination parameter significantly affects the number of detected anomalies
- Missing values are handled automatically during preprocessing
Troubleshooting
- Import errors: Make sure all requirements are installed
- File not found: Check that the AIS data file path is correct
- Memory issues: For very large datasets, consider processing in chunks
- Plotting issues: Ensure matplotlib backend is properly configured
Contact
Generated for AIS Anomaly Detection Project - 2025-01-11