ferds003 commited on
Commit
af6f32a
·
1 Parent(s): b4bcc31

emphasize something on data curation

Browse files
Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -256,7 +256,7 @@ with DemoTAB:
256
 
257
  with DataCurationTAB:
258
  st.markdown("""
259
- Data cleaning and pre-processing is necessary as we are considering three datasets with different contexts. Below is a summary of the data treatment and insights done to make the versions of the dataset.
260
 
261
  - For [Dataset 1](https://www.kaggle.com/datasets/scottleechua/ph-spam-marketing-sms-w-timestamps):
262
  - drop any null values; drop any full redactions done in `text` column through regex. Drops 74% of the dataset as text sms data is salient to the project.
 
256
 
257
  with DataCurationTAB:
258
  st.markdown("""
259
+ Data cleaning and pre-processing is necessary as we are considering three datasets with different contexts. Below is a summary of the data treatment and insights done to make the versions of the dataset. <mark>We avoided the use of the UCL SMS repository for this project as this does not capture the filipino context.</mark>
260
 
261
  - For [Dataset 1](https://www.kaggle.com/datasets/scottleechua/ph-spam-marketing-sms-w-timestamps):
262
  - drop any null values; drop any full redactions done in `text` column through regex. Drops 74% of the dataset as text sms data is salient to the project.