--- tags: - sklearn - random-forest - travel-prediction - classification library_name: sklearn --- # Travel Mode Prediction Model This model predicts the preferred travel mode (Walk, Bike, Car, Bus, Train, Flight) based on various factors including distance, budget, time constraints, and more. ## Model Details - **Model Type**: RandomForestClassifier - **Library**: scikit-learn - **Input Features**: - start_location: Starting location (categorical) - end_location: Destination location (categorical) - total_distance_km: Distance in kilometers (numerical) - season: Travel season (categorical) - day_type: Weekday/Weekend (categorical) - traffic_density: Traffic density (0-1, numerical) - user_budget: Budget in rupees (numerical) - user_time_constraint_hr: Time constraint in hours (numerical) ## Usage ```python import joblib import numpy as np import pandas as pd # Load model and pipeline model = joblib.load('model.pkl') pipeline = joblib.load('pipeline.pkl') # Example input input_data = { 'start_location': 'Delhi', 'end_location': 'Mumbai', 'total_distance_km': 1400, 'season': 'Summer', 'day_type': 'Weekday', 'traffic_density': 0.5, 'user_budget': 5000, 'user_time_constraint_hr': 24 } # Preprocessing logic (simplified - see app.py for full implementation) features = pipeline['features'] input_row = [] for feature in features: if feature == 'start_location': encoder = pipeline['label_encoders']['start_location'] encoded = encoder.transform([input_data[feature]])[0] if input_data[feature] in encoder.classes_ else 0 input_row.append(encoded) elif feature == 'end_location': encoder = pipeline['label_encoders']['end_location'] encoded = encoder.transform([input_data[feature]])[0] if input_data[feature] in encoder.classes_ else 0 input_row.append(encoded) elif feature == 'season': encoder = pipeline['label_encoders']['season'] encoded = encoder.transform([input_data[feature]])[0] if input_data[feature] in encoder.classes_ else 0 input_row.append(encoded) elif feature == 'day_type': encoder = pipeline['label_encoders']['day_type'] encoded = encoder.transform([input_data[feature]])[0] if input_data[feature] in encoder.classes_ else 0 input_row.append(encoded) else: input_row.append(input_data[feature]) # Scale numerical features numerical_cols = ['total_distance_km', 'traffic_density', 'user_budget', 'user_time_constraint_hr'] numerical_data = {col: [input_row[features.index(col)]] for col in numerical_cols} df_numerical = pd.DataFrame(numerical_data) numerical_scaled = pipeline['scaler'].transform(df_numerical) # Reconstruct full feature array input_scaled = [] num_idx = 0 for feature in features: if feature in numerical_cols: input_scaled.append(numerical_scaled[0][num_idx]) num_idx += 1 else: input_scaled.append(input_row[features.index(feature)]) input_scaled = np.array([input_scaled]) # Predict prediction = model.predict(input_scaled) predicted_mode = pipeline['target_encoder'].inverse_transform(prediction)[0] probabilities = model.predict_proba(input_scaled)[0] mode_names = pipeline['target_encoder'].classes_ prob_dict = {mode: float(prob) for mode, prob in zip(mode_names, probabilities)} print(f"Predicted mode: {predicted_mode}") print(f"Probabilities: {prob_dict}") ``` ## Training Data Trained on a dataset of travel scenarios across Indian cities with various travel parameters. ## Files - `model.pkl`: Trained RandomForestClassifier model - `pipeline.pkl`: Preprocessing pipeline with encoders and scaler ## Performance The model achieves good accuracy on the test set for travel mode prediction based on the given features.