Spaces:

shegga
/

SentimentAnalysisForNMTTNT

Runtime error

shegga commited on 23 days ago

Commit

96462ff

1 Parent(s): ab92e28

🔧 Fix Hugging Face Space configuration - Move files to root

- Move README.md with proper YAML config to root directory
- Move app.py and requirements.txt to root for Spaces deployment
- Ensure all required files are in repository root
- Fix missing configuration error

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>

Files changed (2) hide show

README.md +105 -372
requirements.txt +27 -0

README.md CHANGED Viewed

@@ -1,409 +1,132 @@
 # 🎭 Vietnamese Sentiment Analysis
-A comprehensive Vietnamese sentiment analysis system built with transformer models, featuring training, testing, demo, and web interface capabilities with advanced memory management.
 ## 🚀 Features
-- **🤖 Transformer-based Model**: Fine-tuned Vietnamese sentiment analysis using Visobert
-- **🌐 Interactive Web Interface**: Real-time sentiment analysis via Gradio with memory optimization
-- **📊 Comprehensive Testing**: Model evaluation with confusion matrix and classification metrics
-- **⚡ Memory Efficient**: Built-in memory management, batch processing limits, and quantization support
-- **🎯 Easy to Use**: Simple command-line interface and web UI
-- **📈 Performance Monitoring**: Real-time memory usage tracking and optimization
-## 📁 Project Structure
-```
-SentimentAnalysis/
-├── README.md                          # 📚 This file
-├── requirements.txt                   # 📦 Python dependencies
-├── .gitignore                         # 🚫 Git ignore rules
-│
-├── py/                                # 🐍 Core Python modules
-│   ├── __init__.py                   # Package initialization
-│   ├── fine_tune_sentiment.py        # 🔧 Core fine-tuning utilities
-│   ├── test_model.py                 # 🧪 Model testing and evaluation
-│   ├── demo.py                      # 💻 Demo functionality
-│   └── gradio_app.py                # 🌐 Web interface (memory-optimized)
-│
-├── main.py                            # 🚀 Main entry point (all commands)
-├── train.py                           # 🏋️ Training script
-├── test.py                            # 🧪 Testing script
-├── demo.py                            # 💻 Interactive demo
-└── web.py                             # 🌐 Web interface launcher
-│
-├── vietnamese_sentiment_finetuned/   # 🤖 Trained model (auto-generated)
-├── confusion_matrix.png             # 📊 Evaluation visualization (auto-generated)
-├── training_history.png             # 📈 Training progress (auto-generated)
-├── pdf/                             # 📄 Documentation folder
-├── venv/                            # 🐍 Virtual environment
-├── .git/                            # 📝 Git repository
-└── .claude/                         # 🤖 Claude configuration
-```
-## 🛠️ Installation
-1. **Clone and Setup Environment**
-```bash
-cd SentimentAnalysis
-python -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-```
-2. **Install Dependencies**
-```bash
-pip install -r requirements.txt
-```
 ## 🎯 Usage
-### Quick Start Options
-#### **Option 1: Use Individual Scripts**
-```bash
-# Train the model
-python train.py
-# Test the model
-python test.py
-# Run interactive demo
-python demo.py
-# Launch web interface
-python web.py
-```
-#### **Option 2: Use Main Entry Point**
-```bash
-# Train with custom settings
-python main.py train --batch-size 32 --epochs 5
-# Test the model
-python main.py test --model-path ./vietnamese_sentiment_finetuned
-# Run interactive demo
-python main.py demo
-# Launch web interface with memory options
-python main.py web --quantize --max-batch-size 20 --port 8080
-```
-### 1. Training the Model
-```bash
-# Basic training
-python train.py
-# Custom batch size and epochs
-python train.py 32 5
-# Using main script
-python main.py train --batch-size 32 --epochs 5 --learning-rate 1e-5
-```
-### 2. Testing the Model
-```bash
-# Basic testing
-python test.py
-# Test with custom model path
-python test.py /path/to/custom/model
-# Using main script
-python main.py test --model-path ./vietnamese_sentiment_finetuned
-```
-### 3. Interactive Demo
-```bash
-# Run demo
-python demo.py
-# Using main script
-python main.py demo
-```
-### 4. Web Interface
-```bash
-# Standard usage (memory-efficient defaults)
-python web.py
-# High memory efficiency (quantization + small batches)
-python web.py --quantize --max-batch-size 5 --max-memory 2048
-# Large batch processing
-python web.py --max-batch-size 20 --max-memory 8192
-# Custom server configuration
-python web.py --port 8080 --host 0.0.0.0 --quantize
-# Using main script
-python main.py web --quantize --max-batch-size 20 --port 8080
-```
-## 🌐 Web Interface Features
-The Gradio web interface provides:
-### 📝 Single Text Analysis
-- Real-time sentiment prediction
-- Confidence scores with visual charts
-- Memory usage monitoring
-- Example texts for quick testing
-### 📊 Batch Analysis
-- Process multiple texts at once
-- Memory-efficient batch processing
-- Automatic batch size limits
-- Batch summary with sentiment distribution
-### 🛡️ Memory Management
-- **Automatic Cleanup**: Memory cleaned after each prediction
-- **Batch Limits**: Configurable maximum texts per batch
-- **Memory Monitoring**: Real-time memory usage tracking
-- **GPU Optimization**: CUDA cache clearing when available
-- **Quantization**: Optional model quantization for CPU (~4x memory reduction)
-### ℹ️ Model Information
-- Detailed model specifications
-- Performance metrics
-- Memory management settings
-- Usage tips and troubleshooting
-## 🔧 Command Line Options
-### Individual Scripts
-#### `train.py`
-```bash
-python train.py [batch_size] [epochs]
-```
-#### `test.py`
-```bash
-python test.py [model_path]
-```
-#### `demo.py`
-```bash
-python demo.py
-```
-#### `web.py`
-```bash
-python web.py [--max-batch-size SIZE] [--quantize] [--max-memory MB] [--port PORT] [--host HOST]
-```
-### Main Entry Point (`main.py`)
-#### Training Command
-```bash
-python main.py train [--batch-size SIZE] [--epochs NUM] [--learning-rate RATE]
-```
-#### Testing Command
-```bash
-python main.py test [--model-path PATH]
-```
-#### Demo Command
-```bash
-python main.py demo
-```
-#### Web Interface Command
-```bash
-python main.py web [--max-batch-size SIZE] [--quantize] [--max-memory MB] [--port PORT] [--host HOST]
-```
-**Memory Management Options:**
-- `--max-batch-size`: Maximum batch size for memory efficiency (default: 10)
-- `--quantize`: Enable model quantization for memory efficiency (CPU only)
-- `--max-memory`: Maximum memory usage in MB (default: 4096)
-- `--port`: Port to run the interface on (default: 7862)
-- `--host`: Host to bind the interface to (default: 127.0.0.1)
 ## 📊 Model Details
-- **Base Model**: 5CD-AI/Vietnamese-Sentiment-visobert
-- **Dataset**: uitnlp/vietnamese_students_feedback
-- **Labels**: Negative, Neutral, Positive
 - **Language**: Vietnamese
-- **Architecture**: Transformer-based sequence classification
 - **Max Sequence Length**: 512 tokens
-## 📈 Performance Metrics
-- **Accuracy**: 85-90% (on validation set)
-- **Processing Speed**: ~100ms per text
-- **Memory Usage**: Configurable (default 4GB limit)
-- **Batch Processing**: Up to 20 texts (configurable)
-## 🛡️ Memory Management
-The system includes comprehensive memory management:
-### Automatic Features
-- Memory cleanup after each prediction
-- GPU cache clearing for CUDA
 - Garbage collection management
-- Memory monitoring before/after operations
-### User Controls
-- Configurable batch size limits
-- Memory limit enforcement
-- Manual memory cleanup button
-- Real-time memory usage display
-### Optimization Options
-- Dynamic quantization (CPU only)
-- Batch processing optimization
-- Memory-efficient inference
-## 🔍 Troubleshooting
-### Memory Issues
-- Enable quantization: `python gradio_app.py --quantize`
-- Reduce batch size: `python gradio_app.py --max-batch-size 5`
-- Lower memory limit: `python gradio_app.py --max-memory 2048`
-- Use manual cleanup: Click "Memory Cleanup" button in web interface
-### Model Loading Issues
-- Ensure model is trained: `python run_training.py`
-- Check model directory: `ls -la vietnamese_sentiment_finetuned/`
-- Verify dependencies: `pip install -r requirements.txt`
-### Performance Optimization
-- Use GPU if available (CUDA)
-- Enable quantization for CPU inference
-- Monitor memory usage in web interface
-- Adjust batch size based on available memory
 ## 📄 Requirements
 See `requirements.txt` for complete dependency list:
-```
-torch>=2.0.0
-transformers>=4.21.0
-datasets>=2.0.0
-gradio>=4.0.0
-pandas>=1.5.0
-numpy>=1.21.0
-scikit-learn>=1.1.0
-matplotlib>=3.5.0
-seaborn>=0.11.0
-psutil>=5.9.0
-```
-## 🎯 Example Usage
-### Command Line Demo
-```python
-from py.demo import SentimentDemo
-demo = SentimentDemo()
-demo.load_model()
-demo.interactive_demo()
-```
-### Web Interface
-1. Train model: `python train.py`
-2. Launch interface: `python web.py`
-3. Open browser to `http://127.0.0.1:7862`
-4. Enter Vietnamese text for analysis
-### Batch Processing
-```python
-from py.gradio_app import SentimentGradioApp
-app = SentimentGradioApp(max_batch_size=20)
-app.load_model()
-texts = ["Tuyệt vời!", "Bình thường", "Rất tệ"]
-results, summary = app.batch_predict(texts)
-```
-### Model Testing
-```python
-from py.test_model import SentimentTester
-tester = SentimentTester(model_path="./vietnamese_sentiment_finetuned")
-tester.load_model()
-sentiment, confidence = tester.predict_sentiment("Giảng viên dạy rất hay!")
-```
-### Fine-Tuning
-```python
-from py.fine_tune_sentiment import SentimentFineTuner
-fine_tuner = SentimentFineTuner(
-    model_name="5CD-AI/Vietnamese-Sentiment-visobert",
-    dataset_name="uitnlp/vietnamese_students_feedback"
-)
-train_result, eval_results = fine_tuner.run_fine_tuning(
-    output_dir="./my_model",
-    learning_rate=2e-5,
-    batch_size=16,
-    num_epochs=3
-)
-```
-## 📝 Model Loading Examples
-### Loading the Fine-Tuned Model
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained("./vietnamese_sentiment_finetuned")
-model = AutoModelForSequenceClassification.from_pretrained("./vietnamese_sentiment_finetuned")
-```
-### Making Predictions
-```python
-import torch
-def predict_sentiment(text):
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
-    with torch.no_grad():
-        outputs = model(**inputs)
-        predictions = torch.softmax(outputs.logits, dim=-1)
-        predicted_class = torch.argmax(predictions, dim=-1).item()
-    sentiment_labels = ["Negative", "Neutral", "Positive"]
-    return sentiment_labels[predicted_class], predictions[0][predicted_class].item()
-# Example
-text = "Giảng viên dạy rất hay và tâm huyết."
-sentiment, confidence = predict_sentiment(text)
-print(f"Sentiment: {sentiment}, Confidence: {confidence:.3f}")
-```
-## 📊 Dataset Information
-The UIT-VSFC corpus contains over 16,000 Vietnamese student feedback sentences with:
-- **Sentiment Classification**: Positive, Neutral, Negative
-- **Topic Classification**: Various educational topics
-- **Inter-annotator agreement**: >91% for sentiment, >71% for topics
-- **Original F1-score**: ~88% for sentiment (Maximum Entropy baseline)
-## 🔧 Hardware Requirements
-- **Minimum**: 8GB RAM, CPU
-- **Recommended**: GPU with 8GB+ VRAM for faster training
-- **Storage**: ~2GB for model and datasets
-## 📝 License
-This project uses open-source components for educational and research purposes. Please check individual licenses for:
-- 5CD-AI/Vietnamese-Sentiment-visobert
-- uitnlp/vietnamese_students_feedback
-## 🤝 Contributing
-Feel free to submit issues and enhancement requests!
-## 📄 Citation
-If you use this work or the dataset, please cite:
 ```bibtex
 @InProceedings{8573337,
@@ -418,8 +141,18 @@ If you use this work or the dataset, please cite:
 }
 ```
----
-**Quick Start**: `python train.py && python web.py`
-**Alternative**: `python main.py train && python main.py web`

+---
+title: Vietnamese Sentiment Analysis
+emoji: 🎭
+colorFrom: green
+colorTo: blue
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+---
 # 🎭 Vietnamese Sentiment Analysis
+A Vietnamese sentiment analysis web interface built with Gradio and transformer models, optimized for Hugging Face Spaces deployment.
 ## 🚀 Features
+- **🤖 Transformer-based Model**: Uses 5CD-AI/Vietnamese-Sentiment-visobert from Hugging Face Hub
+- **🌐 Interactive Web Interface**: Real-time sentiment analysis via Gradio
+- **⚡ Memory Efficient**: Built-in memory management and batch processing limits
+- **📊 Visual Analysis**: Confidence scores with interactive charts
+- **📝 Batch Processing**: Analyze multiple texts at once
+- **🛡️ Memory Management**: Real-time memory monitoring and cleanup
 ## 🎯 Usage
+### Single Text Analysis
+1. Enter Vietnamese text in the input field
+2. Click "Analyze Sentiment"
+3. View the sentiment prediction with confidence scores
+4. See probability distribution in the chart
+### Batch Analysis
+1. Switch to "Batch Analysis" tab
+2. Enter multiple Vietnamese texts (one per line)
+3. Click "Analyze All" to process all texts
+4. View comprehensive batch summary with sentiment distribution
+### Memory Management
+- Monitor real-time memory usage
+- Use "Memory Cleanup" button if needed
+- Automatic cleanup after each prediction
+- Maximum 10 texts per batch for efficiency
 ## 📊 Model Details
+- **Model**: 5CD-AI/Vietnamese-Sentiment-visobert
+- **Architecture**: Transformer-based (XLM-RoBERTa)
 - **Language**: Vietnamese
+- **Labels**: Negative, Neutral, Positive
 - **Max Sequence Length**: 512 tokens
+- **Device**: Automatic CUDA/CPU detection
+## 💡 Example Usage
+Try these example Vietnamese texts:
+- "Giảng viên dạy rất hay và tâm huyết." (Positive)
+- "Môn học này quá khó và nhàm chán." (Negative)
+- "Lớp học ổn định, không có gì đặc biệt." (Neutral)
+## 🛠️ Technical Features
+### Memory Optimization
+- Automatic GPU cache clearing
 - Garbage collection management
+- Memory usage monitoring
+- Batch size limits
+- Real-time memory tracking
+### Performance
+- ~100ms processing time per text
+- Supports up to 512 token sequences
+- Efficient batch processing
+- Memory limit: 8GB (Hugging Face Spaces)
+## 📋 Model Performance
+The model provides:
+- **Sentiment Classification**: Positive, Neutral, Negative
+- **Confidence Scores**: Probability distribution across classes
+- **Real-time Processing**: Fast inference on CPU/GPU
+- **Batch Analysis**: Efficient processing of multiple texts
+## 🔧 Deployment
+This Space is configured for Hugging Face Spaces with:
+- **SDK**: Gradio 4.44.0
+- **Hardware**: CPU (with CUDA support if available)
+- **Memory**: 8GB limit with optimization
+- **Model Loading**: Direct from Hugging Face Hub
 ## 📄 Requirements
 See `requirements.txt` for complete dependency list:
+- torch>=2.0.0
+- transformers>=4.21.0
+- gradio>=4.44.0
+- pandas, numpy, scikit-learn
+- psutil for memory monitoring
+## 🎯 Use Cases
+- **Education**: Analyze student feedback
+- **Customer Service**: Analyze customer reviews
+- **Social Media**: Monitor sentiment in posts
+- **Research**: Vietnamese text analysis
+- **Business**: Customer sentiment tracking
+## 🔍 Troubleshooting
+### Memory Issues
+- Use "Memory Cleanup" button
+- Reduce batch size
+- Refresh the page if needed
+### Model Loading
+- Model loads automatically from Hugging Face Hub
+- No local training required
+- Automatic fallback to CPU if GPU unavailable
+### Performance Tips
+- Clear, grammatically correct Vietnamese text works best
+- Longer texts (20-200 words) provide better context
+- Use batch processing for multiple texts
+## 📝 Citation
+If you use this model or Space, please cite the original model:
 ```bibtex
 @InProceedings{8573337,
 }
 ```
+## 🤝 Contributing
+Feel free to:
+- Submit issues and feedback
+- Suggest improvements
+- Report bugs
+- Request new features
+## 📄 License
+This Space uses open-source components under MIT license.
+---
+**Try it now!** Enter some Vietnamese text above to see the sentiment analysis in action. 🎭

requirements.txt ADDED Viewed

	@@ -0,0 +1,27 @@

+# Core dependencies for Hugging Face Spaces
+torch>=2.0.0
+transformers>=4.21.0
+datasets>=2.0.0
+gradio>=4.44.0
+# Data processing
+pandas>=1.5.0
+numpy>=1.21.0
+scikit-learn>=1.1.0
+# Visualization
+matplotlib>=3.5.0
+seaborn>=0.11.0
+# Memory monitoring
+psutil>=5.9.0
+# System monitoring
+accelerate>=0.21.0
+safetensors>=0.3.1
+# Additional dependencies
+sentencepiece>=0.1.96
+protobuf>=3.20.0
+tokenizers>=0.13.3
+huggingface-hub>=0.16.4