SajilAwale commited on
Commit
aedbadf
·
verified ·
1 Parent(s): fbc5a30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -3,17 +3,17 @@ license: apache-2.0
3
  language:
4
  - en
5
  base_model:
6
- - nasa-impact/nasa-smd-ibm-v0.1
7
  library_name: transformers
8
  ---
9
 
10
  # Science Keyword Classification model
11
 
12
- We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1) for classifying scientific keywords from NASA's Common Metadata Repository (CMR). The project aims to improve the accessibility and organization of Earth observation metadata by predicting associated keywords in an Extreme Multi-Label Classification setting.
13
 
14
  ## Model Overview
15
 
16
- - **Base Model:** INDUS, fine-tuned for multi-label classification.
17
  - **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
18
  - **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a dataset of 42,474 records and 3,240 labels. You can find the [dataset here](https://huggingface.co/datasets/nasa-impact/science-keyword-classification-dataset)
19
 
@@ -42,14 +42,18 @@ print(predicted_labels)
42
  1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.
43
  2. **Experiment 2 (alpha-1.1.1):** Focal loss with γ = 4.
44
  3. **Experiment 3 (alpha-1.1.2):** Focal loss with γ = 2.
45
- 4. **Final (alpha-1.2.1):** Focal loss (γ = 2) with stratified splitting.
 
46
 
47
  ## Results
48
 
49
- The model with focal loss and stratified sampling (alpha-1.2.1) outperformed all other configurations and previous models in terms of precision, recall, F1 score, and Jaccard similarity. The weighted metrics at various threshold for the model can be found below.
 
50
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f0e7de9cf89c9ed1bf92a2/CvhGPUzA2vJua3uu9H3tA.png)
51
 
52
- Please find accompanying [technical writeup here](https://drive.google.com/file/d/1g4l5tLjeNUu3z8fcVMKuIXs7hLifVfSu/view?usp=sharing).
 
 
53
  ## References
54
 
55
  - RoBERTa: [arXiv](https://arxiv.org/abs/1907.11692)
 
3
  language:
4
  - en
5
  base_model:
6
+ - nasa-impact/indus-sde-v0.2
7
  library_name: transformers
8
  ---
9
 
10
  # Science Keyword Classification model
11
 
12
+ We have fine-tuned [INDUS-SDE Model](https://huggingface.co/nasa-impact/indus-sde-v0.2) for classifying scientific keywords from NASA's Common Metadata Repository (CMR). The project aims to improve the accessibility and organization of Earth observation metadata by predicting associated keywords in an Extreme Multi-Label Classification setting.
13
 
14
  ## Model Overview
15
 
16
+ - **Base Model:** INDUS-SDE, fine-tuned for multi-label classification.
17
  - **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
18
  - **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a dataset of 42,474 records and 3,240 labels. You can find the [dataset here](https://huggingface.co/datasets/nasa-impact/science-keyword-classification-dataset)
19
 
 
42
  1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.
43
  2. **Experiment 2 (alpha-1.1.1):** Focal loss with γ = 4.
44
  3. **Experiment 3 (alpha-1.1.2):** Focal loss with γ = 2.
45
+ 4. **Experiment 4 (alpha-1.2.1):** Focal loss (γ = 2) with stratified splitting.
46
+ 5. 4. **Experiment 5 (INDUS-SDE-GKR):** Focal loss (γ = 2) with stratified splitting with INDUS-SDE base model.
47
 
48
  ## Results
49
 
50
+ The INDUS-SDE-GKR model outperformed all other configurations, including the previous best alpha-1.2.1. By leveraging domain-specific pre-training on the SDE dataset and a larger context window (1024 tokens), INDUS-SDE achieved a Mean Reciprocal Rank (MRR) of 0.791, compared to 0.782 for alpha-1.2.1 and 0.744 for the ModernBERT-SDE baseline.
51
+
52
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f0e7de9cf89c9ed1bf92a2/CvhGPUzA2vJua3uu9H3tA.png)
53
 
54
+ ![indus-sde-gkr-mrr-comp](https://cdn-uploads.huggingface.co/production/uploads/63f0e7de9cf89c9ed1bf92a2/1mZcxlKdTh0SewEDBWpFx.png)
55
+
56
+ Please find accompanying [technical writeup here](https://github.com/NASA-IMPACT/science-keywords-classification/blob/develop/documents/Science_Keyword_Classification.pdf).
57
  ## References
58
 
59
  - RoBERTa: [arXiv](https://arxiv.org/abs/1907.11692)