CS410 Final Project -- Amazon Review Summarization and Sentiment Analysis
Dataset
Available here, we use four categories, All_Beauty, Digital_Music, Handmade_Product, and Health_and_Personal_Care.
Workflow
Data preprocessing -> Sentiment classification (group positive and negative reviews to proceed) -> Fine-tuning summarization model on training data -> Evaluate summarization model on test data.
Models
- Sentiment classification uses pre-trained DistillBERT.
- Review summarization uses facebook/bart-large-cnn fine-tuned on category of review dataset.
Layout
checkpointsfolder contains fine-tuned models for each specific categories of dataset.srcfolder contains source code.docsrecords experiments results.
Usage
Run sentiment classification
python src/classification.py [category]
Run fine-tuning
python src/finetune.py [category]
Run summarization, you should firstly obtain an Anthropic Claude API key, and
export ANTHROPIC_API_KEY='your-api-key-here'
then
python src/summarization.py
Model tree for yunqili4/cs410-final-project
Base model
facebook/bart-large-cnn