File size: 2,987 Bytes
5bbf115
 
566d199
 
5bbf115
60fc741
 
324b0e7
 
1acf1e8
60fc741
324b0e7
60fc741
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bffe27
60fc741
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ecd9b9a
 
 
 
 
 
 
 
 
 
 
 
 
 
60fc741
 
 
 
 
9bffe27
 
60fc741
5bbf115
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
sdk: streamlit
sdk_version: 1.51.0
license: apache-2.0
---
# πŸ’³ Credit Card Fraud Detection Dashboard

[![Streamlit](https://img.shields.io/badge/Powered%20by-Streamlit-FF4B4B)](https://streamlit.io/)\
[![Data License: CC BY-NC 4.0](https://img.shields.io/badge/Data%20License-CC%20BY--NC%204.0-lightgrey.svg)](DATA_LICENSE)\
[![Made with ❀️ by Tarek Masryo](https://img.shields.io/badge/Made%20by-Tarek%20Masryo-blue)](https://github.com/tarekmasryo)


---


## πŸ“Œ Overview

Interactive dashboard built with **Streamlit, Plotly, and Scikit-learn** for real-time **fraud detection analysis**.  
It demonstrates a **business-aware ML pipeline** on the classic **Credit Card Fraud Dataset** (284,807 transactions, only 492 frauds β‰ˆ 0.17%).  

- πŸ” Upload your own transaction CSV or use the built-in dataset  
- βš–οΈ Custom decision thresholds with cost-sensitive analysis  
- πŸ“Š Confusion matrix, ROC/PR curves, and cost–threshold visualization  
- πŸ’‘ Permutation feature importance for interpretability  
- 🧾 Segmented performance profiling (by amount, time of day, etc.)

---

## πŸ“Š Dashboard Preview

### Data Overview  
![Data](assets/data_overview.png)

### Prediction Engine  
![Prediction](assets/prediction_engine.png)

### Model Metrics  
![Metrics](assets/model_metrics.png)

### Model Insights  
![Insights](assets/model_insights.png)

### Data Quality & Segments  
![Segments](assets/data_quality.png)

---

## πŸ”‘ Features

- **Models**: RandomForest & XGBoost (calibrated)  
- **Presets**: Strict / Balanced / Lenient thresholds  
- **Threshold Finder**: auto-select by target Precision/Recall  
- **Cost Analysis**: business-aligned FP vs FN costs  
- **Visuals**: Confusion matrix, ROC, PR, cost vs threshold curves  
- **Insights**: Permutation importance, segmented KPIs  
- **Data Handling**: automatic schema validation + engineered features (`log(Amount)`, business hours, night proxy)

---

## πŸš€ Run Locally

Clone the repo and install requirements:

```bash
git clone https://github.com/tarekmasryo/fraud-detection-dashboard.git
cd fraud-detection-dashboard
pip install -r requirements.txt
```

Run the app:

```bash
streamlit run app.py
```

---



## ☁️ Deploy on Hugging Face Spaces

This repository is ready to be deployed as a **Streamlit Space** on [Hugging Face](https://huggingface.co/spaces).  
Make sure to include the following files in your repo:

- `app.py` β†’ main app file  
- `requirements.txt` β†’ Python dependencies  
- `artifacts/` β†’ trained model `.joblib` files and `thresholds.json`  
- `data/creditcard.csv` (optional, for default dataset)  

---


## πŸ“œ License & Attribution

- **Data** β†’ Original [Credit Card Fraud Dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)  
  Licensed under **CC BY-NC 4.0** β€” for research & educational use only.

---

## Related Repositories
- πŸ” [Fraud Detection EDA + Baseline Models](https://github.com/tarekmasryo/creditcard-fraud-detection)