kangourous commited on
Commit
6232091
·
verified ·
1 Parent(s): c8e1833

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -34,26 +34,26 @@ The model uses the `rfcx/frugalai` dataset:
34
  Most audio samples are 3 seconds long, with a sampling rate of 12,000 Hz. This means each row of the dataset contains 36,000 elements.
35
  - **Resampling**: Audio files with a higher sampling rate are downsampled to 12,000 Hz.
36
  - **Padding**: Audio files shorter than 3 seconds are padded with their reversed signal at the end.
37
- - **Storage**: Raw audio data is stored in a NumPy array of size `(n, 36,000)` with `float16` precision to reduce memory usage without significant precision loss compared to `float32`.
38
 
39
  ---
40
 
41
  ## Model Description: XGBoost
42
 
43
- This is an XGBoost regressor that outputs probabilities. It consists of 3,000 trees.
44
 
45
  ### Input Features
46
 
47
  XGBoost uses the following input features:
48
  - **MFCC**:
49
  - 55 MFCCs are retained.
50
- - Calculated with a window size of 1,024 for `nfft`.
51
  - Mean and standard deviation are taken along the spatial axis (resulting in 110 features).
52
  - **Mel Spectrogram**:
53
- - Calculated with a window size of 1,024 for `nfft` and 55 mel bands.
54
  - Mean and standard deviation along the spatial axis (110 features).
55
  - Standard deviation of the delta coefficients of the spectrogram (55 additional features). This captures the characteristic signature of chainsaw sounds transitioning from idle to full load.
56
- - (See [Exploratory Data Analysis](/spaces/kangourous/submission-audio-task/blob/main/notebooks/EDA.ipynb))
57
 
58
  ### Training Details
59
 
@@ -81,7 +81,8 @@ The CNN uses **Log Mel Spectrograms** as input features:
81
  - Optimizer: Adam.
82
  - Learning rate: `0.001`.
83
  - Loss function: Binary Cross-Entropy Loss.
84
- - Data augmentation: Additional environment-labeled sounds were added to increase dataset noise without modifying the labels.
 
85
 
86
  **Training notebook**: [CNN Training Notebook](/spaces/kangourous/submission-audio-task/blob/main/notebooks/CNN_training.ipynb)
87
 
@@ -111,7 +112,8 @@ Environmental impact is tracked using CodeCarbon, measuring:
111
  This tracking helps establish a baseline for the environmental impact of model deployment and inference.
112
 
113
  ## Limitations
114
- - The model can classify as a Chainsaw another motor sound
 
115
 
116
  ## Ethical Considerations
117
  - Environmental impact is tracked to promote awareness of AI's carbon footprint
 
34
  Most audio samples are 3 seconds long, with a sampling rate of 12,000 Hz. This means each row of the dataset contains 36,000 elements.
35
  - **Resampling**: Audio files with a higher sampling rate are downsampled to 12,000 Hz.
36
  - **Padding**: Audio files shorter than 3 seconds are padded with their reversed signal at the end.
37
+ - **Storage**: Raw audio data is stored in a NumPy array of size `(n, 36000)` with `float16` precision to reduce memory usage without significant precision loss compared to `float32`.
38
 
39
  ---
40
 
41
  ## Model Description: XGBoost
42
 
43
+ This is an XGBoost regressor that outputs probabilities. It consists of 3000 trees.
44
 
45
  ### Input Features
46
 
47
  XGBoost uses the following input features:
48
  - **MFCC**:
49
  - 55 MFCCs are retained.
50
+ - Calculated with a window size of 1024 for `nfft`.
51
  - Mean and standard deviation are taken along the spatial axis (resulting in 110 features).
52
  - **Mel Spectrogram**:
53
+ - Calculated with a window size of 1024 for `nfft` and 55 mel bands.
54
  - Mean and standard deviation along the spatial axis (110 features).
55
  - Standard deviation of the delta coefficients of the spectrogram (55 additional features). This captures the characteristic signature of chainsaw sounds transitioning from idle to full load.
56
+ (See [Exploratory Data Analysis](/spaces/kangourous/submission-audio-task/blob/main/notebooks/EDA.ipynb))
57
 
58
  ### Training Details
59
 
 
81
  - Optimizer: Adam.
82
  - Learning rate: `0.001`.
83
  - Loss function: Binary Cross-Entropy Loss.
84
+ - Data augmentation: Mixup was used with pairs of raw audios, such as the second element is always environnement (labeled as 1), and the resulting audio take the label of
85
+ the first element. Thus, there is more variance in the environnment class, and more difficult samples in the chainsaw class.
86
 
87
  **Training notebook**: [CNN Training Notebook](/spaces/kangourous/submission-audio-task/blob/main/notebooks/CNN_training.ipynb)
88
 
 
112
  This tracking helps establish a baseline for the environmental impact of model deployment and inference.
113
 
114
  ## Limitations
115
+ - The model can misclassify a motor sound (Chainsaw vs Motocross, for example)
116
+ - This model is optimized to run on a specific hardware (Nvidia GPU)
117
 
118
  ## Ethical Considerations
119
  - Environmental impact is tracked to promote awareness of AI's carbon footprint