audio_kangourous

Sleeping

App Files Files Community

kangourous commited on Feb 1

Commit

6232091

verified ·

1 Parent(s): c8e1833

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -34,26 +34,26 @@ The model uses the `rfcx/frugalai` dataset:
 Most audio samples are 3 seconds long, with a sampling rate of 12,000 Hz. This means each row of the dataset contains 36,000 elements.
 - **Resampling**: Audio files with a higher sampling rate are downsampled to 12,000 Hz.
 - **Padding**: Audio files shorter than 3 seconds are padded with their reversed signal at the end.
-- **Storage**: Raw audio data is stored in a NumPy array of size `(n, 36,000)` with `float16` precision to reduce memory usage without significant precision loss compared to `float32`.
 ---
 ## Model Description: XGBoost
-This is an XGBoost regressor that outputs probabilities. It consists of 3,000 trees.
 ### Input Features
 XGBoost uses the following input features:
 - **MFCC**:
   - 55 MFCCs are retained.
-  - Calculated with a window size of 1,024 for `nfft`.
   - Mean and standard deviation are taken along the spatial axis (resulting in 110 features).
 - **Mel Spectrogram**:
-  - Calculated with a window size of 1,024 for `nfft` and 55 mel bands.
   - Mean and standard deviation along the spatial axis (110 features).
   - Standard deviation of the delta coefficients of the spectrogram (55 additional features). This captures the characteristic signature of chainsaw sounds transitioning from idle to full load.
-  -  (See [Exploratory Data Analysis](/spaces/kangourous/submission-audio-task/blob/main/notebooks/EDA.ipynb))
 ### Training Details
@@ -81,7 +81,8 @@ The CNN uses **Log Mel Spectrograms** as input features:
 - Optimizer: Adam.
 - Learning rate: `0.001`.
 - Loss function: Binary Cross-Entropy Loss.
-- Data augmentation: Additional environment-labeled sounds were added to increase dataset noise without modifying the labels.
 **Training notebook**: [CNN Training Notebook](/spaces/kangourous/submission-audio-task/blob/main/notebooks/CNN_training.ipynb)
@@ -111,7 +112,8 @@ Environmental impact is tracked using CodeCarbon, measuring:
 This tracking helps establish a baseline for the environmental impact of model deployment and inference.
 ## Limitations
-- The model can classify as a Chainsaw another motor sound
 ## Ethical Considerations
 - Environmental impact is tracked to promote awareness of AI's carbon footprint

 Most audio samples are 3 seconds long, with a sampling rate of 12,000 Hz. This means each row of the dataset contains 36,000 elements.
 - **Resampling**: Audio files with a higher sampling rate are downsampled to 12,000 Hz.
 - **Padding**: Audio files shorter than 3 seconds are padded with their reversed signal at the end.
+- **Storage**: Raw audio data is stored in a NumPy array of size `(n, 36000)` with `float16` precision to reduce memory usage without significant precision loss compared to `float32`.
 ---
 ## Model Description: XGBoost
+This is an XGBoost regressor that outputs probabilities. It consists of 3000 trees.
 ### Input Features
 XGBoost uses the following input features:
 - **MFCC**:
   - 55 MFCCs are retained.
+  - Calculated with a window size of 1024 for `nfft`.
   - Mean and standard deviation are taken along the spatial axis (resulting in 110 features).
 - **Mel Spectrogram**:
+  - Calculated with a window size of 1024 for `nfft` and 55 mel bands.
   - Mean and standard deviation along the spatial axis (110 features).
   - Standard deviation of the delta coefficients of the spectrogram (55 additional features). This captures the characteristic signature of chainsaw sounds transitioning from idle to full load.
+(See [Exploratory Data Analysis](/spaces/kangourous/submission-audio-task/blob/main/notebooks/EDA.ipynb))
 ### Training Details
 - Optimizer: Adam.
 - Learning rate: `0.001`.
 - Loss function: Binary Cross-Entropy Loss.
+- Data augmentation: Mixup was used with pairs of raw audios, such as the second element is always environnement (labeled as 1), and the resulting audio take the label of
+the first element. Thus, there is more variance in the environnment class, and more difficult samples in the chainsaw class.
 **Training notebook**: [CNN Training Notebook](/spaces/kangourous/submission-audio-task/blob/main/notebooks/CNN_training.ipynb)
 This tracking helps establish a baseline for the environmental impact of model deployment and inference.
 ## Limitations
+- The model can misclassify a motor sound (Chainsaw vs Motocross, for example)
+- This model is optimized to run on a specific hardware (Nvidia GPU)
 ## Ethical Considerations
 - Environmental impact is tracked to promote awareness of AI's carbon footprint