Spaces:
Runtime error
Runtime error
π§ Fix Hugging Face Space configuration - Move files to root
Browse files- Move README.md with proper YAML config to root directory
- Move app.py and requirements.txt to root for Spaces deployment
- Ensure all required files are in repository root
- Fix missing configuration error
π€ Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
- README.md +105 -372
- requirements.txt +27 -0
README.md
CHANGED
|
@@ -1,409 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# π Vietnamese Sentiment Analysis
|
| 2 |
|
| 3 |
-
A
|
| 4 |
|
| 5 |
## π Features
|
| 6 |
|
| 7 |
-
- **π€ Transformer-based Model**:
|
| 8 |
-
- **π Interactive Web Interface**: Real-time sentiment analysis via Gradio
|
| 9 |
-
-
|
| 10 |
-
-
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
-
|
| 14 |
-
## π Project Structure
|
| 15 |
-
|
| 16 |
-
```
|
| 17 |
-
SentimentAnalysis/
|
| 18 |
-
βββ README.md # π This file
|
| 19 |
-
βββ requirements.txt # π¦ Python dependencies
|
| 20 |
-
βββ .gitignore # π« Git ignore rules
|
| 21 |
-
β
|
| 22 |
-
βββ py/ # π Core Python modules
|
| 23 |
-
β βββ __init__.py # Package initialization
|
| 24 |
-
β βββ fine_tune_sentiment.py # π§ Core fine-tuning utilities
|
| 25 |
-
β βββ test_model.py # π§ͺ Model testing and evaluation
|
| 26 |
-
β βββ demo.py # π» Demo functionality
|
| 27 |
-
β βββ gradio_app.py # π Web interface (memory-optimized)
|
| 28 |
-
β
|
| 29 |
-
βββ main.py # π Main entry point (all commands)
|
| 30 |
-
βββ train.py # ποΈ Training script
|
| 31 |
-
βββ test.py # π§ͺ Testing script
|
| 32 |
-
βββ demo.py # π» Interactive demo
|
| 33 |
-
βββ web.py # π Web interface launcher
|
| 34 |
-
β
|
| 35 |
-
βββ vietnamese_sentiment_finetuned/ # π€ Trained model (auto-generated)
|
| 36 |
-
βββ confusion_matrix.png # π Evaluation visualization (auto-generated)
|
| 37 |
-
βββ training_history.png # π Training progress (auto-generated)
|
| 38 |
-
βββ pdf/ # π Documentation folder
|
| 39 |
-
βββ venv/ # π Virtual environment
|
| 40 |
-
βββ .git/ # π Git repository
|
| 41 |
-
βββ .claude/ # π€ Claude configuration
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
## π οΈ Installation
|
| 45 |
-
|
| 46 |
-
1. **Clone and Setup Environment**
|
| 47 |
-
```bash
|
| 48 |
-
cd SentimentAnalysis
|
| 49 |
-
python -m venv venv
|
| 50 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
2. **Install Dependencies**
|
| 54 |
-
```bash
|
| 55 |
-
pip install -r requirements.txt
|
| 56 |
-
```
|
| 57 |
|
| 58 |
## π― Usage
|
| 59 |
|
| 60 |
-
###
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
python train.py
|
| 66 |
-
|
| 67 |
-
# Test the model
|
| 68 |
-
python test.py
|
| 69 |
-
|
| 70 |
-
# Run interactive demo
|
| 71 |
-
python demo.py
|
| 72 |
-
|
| 73 |
-
# Launch web interface
|
| 74 |
-
python web.py
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
#### **Option 2: Use Main Entry Point**
|
| 78 |
-
```bash
|
| 79 |
-
# Train with custom settings
|
| 80 |
-
python main.py train --batch-size 32 --epochs 5
|
| 81 |
-
|
| 82 |
-
# Test the model
|
| 83 |
-
python main.py test --model-path ./vietnamese_sentiment_finetuned
|
| 84 |
-
|
| 85 |
-
# Run interactive demo
|
| 86 |
-
python main.py demo
|
| 87 |
-
|
| 88 |
-
# Launch web interface with memory options
|
| 89 |
-
python main.py web --quantize --max-batch-size 20 --port 8080
|
| 90 |
-
```
|
| 91 |
-
|
| 92 |
-
### 1. Training the Model
|
| 93 |
-
```bash
|
| 94 |
-
# Basic training
|
| 95 |
-
python train.py
|
| 96 |
-
|
| 97 |
-
# Custom batch size and epochs
|
| 98 |
-
python train.py 32 5
|
| 99 |
-
|
| 100 |
-
# Using main script
|
| 101 |
-
python main.py train --batch-size 32 --epochs 5 --learning-rate 1e-5
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
### 2. Testing the Model
|
| 105 |
-
```bash
|
| 106 |
-
# Basic testing
|
| 107 |
-
python test.py
|
| 108 |
-
|
| 109 |
-
# Test with custom model path
|
| 110 |
-
python test.py /path/to/custom/model
|
| 111 |
-
|
| 112 |
-
# Using main script
|
| 113 |
-
python main.py test --model-path ./vietnamese_sentiment_finetuned
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
### 3. Interactive Demo
|
| 117 |
-
```bash
|
| 118 |
-
# Run demo
|
| 119 |
-
python demo.py
|
| 120 |
-
|
| 121 |
-
# Using main script
|
| 122 |
-
python main.py demo
|
| 123 |
-
```
|
| 124 |
-
|
| 125 |
-
### 4. Web Interface
|
| 126 |
-
```bash
|
| 127 |
-
# Standard usage (memory-efficient defaults)
|
| 128 |
-
python web.py
|
| 129 |
-
|
| 130 |
-
# High memory efficiency (quantization + small batches)
|
| 131 |
-
python web.py --quantize --max-batch-size 5 --max-memory 2048
|
| 132 |
-
|
| 133 |
-
# Large batch processing
|
| 134 |
-
python web.py --max-batch-size 20 --max-memory 8192
|
| 135 |
-
|
| 136 |
-
# Custom server configuration
|
| 137 |
-
python web.py --port 8080 --host 0.0.0.0 --quantize
|
| 138 |
-
|
| 139 |
-
# Using main script
|
| 140 |
-
python main.py web --quantize --max-batch-size 20 --port 8080
|
| 141 |
-
```
|
| 142 |
-
|
| 143 |
-
## π Web Interface Features
|
| 144 |
-
|
| 145 |
-
The Gradio web interface provides:
|
| 146 |
-
|
| 147 |
-
### π Single Text Analysis
|
| 148 |
-
- Real-time sentiment prediction
|
| 149 |
-
- Confidence scores with visual charts
|
| 150 |
-
- Memory usage monitoring
|
| 151 |
-
- Example texts for quick testing
|
| 152 |
-
|
| 153 |
-
### π Batch Analysis
|
| 154 |
-
- Process multiple texts at once
|
| 155 |
-
- Memory-efficient batch processing
|
| 156 |
-
- Automatic batch size limits
|
| 157 |
-
- Batch summary with sentiment distribution
|
| 158 |
-
|
| 159 |
-
### π‘οΈ Memory Management
|
| 160 |
-
- **Automatic Cleanup**: Memory cleaned after each prediction
|
| 161 |
-
- **Batch Limits**: Configurable maximum texts per batch
|
| 162 |
-
- **Memory Monitoring**: Real-time memory usage tracking
|
| 163 |
-
- **GPU Optimization**: CUDA cache clearing when available
|
| 164 |
-
- **Quantization**: Optional model quantization for CPU (~4x memory reduction)
|
| 165 |
-
|
| 166 |
-
### βΉοΈ Model Information
|
| 167 |
-
- Detailed model specifications
|
| 168 |
-
- Performance metrics
|
| 169 |
-
- Memory management settings
|
| 170 |
-
- Usage tips and troubleshooting
|
| 171 |
-
|
| 172 |
-
## π§ Command Line Options
|
| 173 |
-
|
| 174 |
-
### Individual Scripts
|
| 175 |
-
|
| 176 |
-
#### `train.py`
|
| 177 |
-
```bash
|
| 178 |
-
python train.py [batch_size] [epochs]
|
| 179 |
-
```
|
| 180 |
-
|
| 181 |
-
#### `test.py`
|
| 182 |
-
```bash
|
| 183 |
-
python test.py [model_path]
|
| 184 |
-
```
|
| 185 |
-
|
| 186 |
-
#### `demo.py`
|
| 187 |
-
```bash
|
| 188 |
-
python demo.py
|
| 189 |
-
```
|
| 190 |
-
|
| 191 |
-
#### `web.py`
|
| 192 |
-
```bash
|
| 193 |
-
python web.py [--max-batch-size SIZE] [--quantize] [--max-memory MB] [--port PORT] [--host HOST]
|
| 194 |
-
```
|
| 195 |
-
|
| 196 |
-
### Main Entry Point (`main.py`)
|
| 197 |
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
#### Testing Command
|
| 204 |
-
```bash
|
| 205 |
-
python main.py test [--model-path PATH]
|
| 206 |
-
```
|
| 207 |
-
|
| 208 |
-
#### Demo Command
|
| 209 |
-
```bash
|
| 210 |
-
python main.py demo
|
| 211 |
-
```
|
| 212 |
-
|
| 213 |
-
#### Web Interface Command
|
| 214 |
-
```bash
|
| 215 |
-
python main.py web [--max-batch-size SIZE] [--quantize] [--max-memory MB] [--port PORT] [--host HOST]
|
| 216 |
-
```
|
| 217 |
|
| 218 |
-
|
| 219 |
-
-
|
| 220 |
-
-
|
| 221 |
-
-
|
| 222 |
-
-
|
| 223 |
-
- `--host`: Host to bind the interface to (default: 127.0.0.1)
|
| 224 |
|
| 225 |
## π Model Details
|
| 226 |
|
| 227 |
-
- **
|
| 228 |
-
- **
|
| 229 |
-
- **Labels**: Negative, Neutral, Positive
|
| 230 |
- **Language**: Vietnamese
|
| 231 |
-
- **
|
| 232 |
- **Max Sequence Length**: 512 tokens
|
|
|
|
| 233 |
|
| 234 |
-
##
|
| 235 |
|
| 236 |
-
|
| 237 |
-
- **Processing Speed**: ~100ms per text
|
| 238 |
-
- **Memory Usage**: Configurable (default 4GB limit)
|
| 239 |
-
- **Batch Processing**: Up to 20 texts (configurable)
|
| 240 |
|
| 241 |
-
|
|
|
|
|
|
|
| 242 |
|
| 243 |
-
|
| 244 |
|
| 245 |
-
###
|
| 246 |
-
-
|
| 247 |
-
- GPU cache clearing for CUDA
|
| 248 |
- Garbage collection management
|
| 249 |
-
- Memory monitoring
|
|
|
|
|
|
|
| 250 |
|
| 251 |
-
###
|
| 252 |
-
-
|
| 253 |
-
-
|
| 254 |
-
-
|
| 255 |
-
-
|
| 256 |
|
| 257 |
-
|
| 258 |
-
- Dynamic quantization (CPU only)
|
| 259 |
-
- Batch processing optimization
|
| 260 |
-
- Memory-efficient inference
|
| 261 |
|
| 262 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
-
|
| 268 |
-
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
- Ensure model is trained: `python run_training.py`
|
| 272 |
-
- Check model directory: `ls -la vietnamese_sentiment_finetuned/`
|
| 273 |
-
- Verify dependencies: `pip install -r requirements.txt`
|
| 274 |
-
|
| 275 |
-
### Performance Optimization
|
| 276 |
-
- Use GPU if available (CUDA)
|
| 277 |
-
- Enable quantization for CPU inference
|
| 278 |
-
- Monitor memory usage in web interface
|
| 279 |
-
- Adjust batch size based on available memory
|
| 280 |
|
| 281 |
## π Requirements
|
| 282 |
|
| 283 |
See `requirements.txt` for complete dependency list:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 284 |
|
| 285 |
-
|
| 286 |
-
torch>=2.0.0
|
| 287 |
-
transformers>=4.21.0
|
| 288 |
-
datasets>=2.0.0
|
| 289 |
-
gradio>=4.0.0
|
| 290 |
-
pandas>=1.5.0
|
| 291 |
-
numpy>=1.21.0
|
| 292 |
-
scikit-learn>=1.1.0
|
| 293 |
-
matplotlib>=3.5.0
|
| 294 |
-
seaborn>=0.11.0
|
| 295 |
-
psutil>=5.9.0
|
| 296 |
-
```
|
| 297 |
-
|
| 298 |
-
## π― Example Usage
|
| 299 |
-
|
| 300 |
-
### Command Line Demo
|
| 301 |
-
```python
|
| 302 |
-
from py.demo import SentimentDemo
|
| 303 |
-
|
| 304 |
-
demo = SentimentDemo()
|
| 305 |
-
demo.load_model()
|
| 306 |
-
demo.interactive_demo()
|
| 307 |
-
```
|
| 308 |
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
### Batch Processing
|
| 316 |
-
```python
|
| 317 |
-
from py.gradio_app import SentimentGradioApp
|
| 318 |
-
|
| 319 |
-
app = SentimentGradioApp(max_batch_size=20)
|
| 320 |
-
app.load_model()
|
| 321 |
-
texts = ["Tuyα»t vα»i!", "BΓ¬nh thΖ°α»ng", "RαΊ₯t tα»"]
|
| 322 |
-
results, summary = app.batch_predict(texts)
|
| 323 |
-
```
|
| 324 |
-
|
| 325 |
-
### Model Testing
|
| 326 |
-
```python
|
| 327 |
-
from py.test_model import SentimentTester
|
| 328 |
-
|
| 329 |
-
tester = SentimentTester(model_path="./vietnamese_sentiment_finetuned")
|
| 330 |
-
tester.load_model()
|
| 331 |
-
sentiment, confidence = tester.predict_sentiment("GiαΊ£ng viΓͺn dαΊ‘y rαΊ₯t hay!")
|
| 332 |
-
```
|
| 333 |
|
| 334 |
-
|
| 335 |
-
```python
|
| 336 |
-
from py.fine_tune_sentiment import SentimentFineTuner
|
| 337 |
-
|
| 338 |
-
fine_tuner = SentimentFineTuner(
|
| 339 |
-
model_name="5CD-AI/Vietnamese-Sentiment-visobert",
|
| 340 |
-
dataset_name="uitnlp/vietnamese_students_feedback"
|
| 341 |
-
)
|
| 342 |
-
train_result, eval_results = fine_tuner.run_fine_tuning(
|
| 343 |
-
output_dir="./my_model",
|
| 344 |
-
learning_rate=2e-5,
|
| 345 |
-
batch_size=16,
|
| 346 |
-
num_epochs=3
|
| 347 |
-
)
|
| 348 |
-
```
|
| 349 |
-
|
| 350 |
-
## π Model Loading Examples
|
| 351 |
-
|
| 352 |
-
### Loading the Fine-Tuned Model
|
| 353 |
-
```python
|
| 354 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 355 |
-
|
| 356 |
-
tokenizer = AutoTokenizer.from_pretrained("./vietnamese_sentiment_finetuned")
|
| 357 |
-
model = AutoModelForSequenceClassification.from_pretrained("./vietnamese_sentiment_finetuned")
|
| 358 |
-
```
|
| 359 |
-
|
| 360 |
-
### Making Predictions
|
| 361 |
-
```python
|
| 362 |
-
import torch
|
| 363 |
-
|
| 364 |
-
def predict_sentiment(text):
|
| 365 |
-
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
|
| 366 |
-
with torch.no_grad():
|
| 367 |
-
outputs = model(**inputs)
|
| 368 |
-
predictions = torch.softmax(outputs.logits, dim=-1)
|
| 369 |
-
predicted_class = torch.argmax(predictions, dim=-1).item()
|
| 370 |
-
|
| 371 |
-
sentiment_labels = ["Negative", "Neutral", "Positive"]
|
| 372 |
-
return sentiment_labels[predicted_class], predictions[0][predicted_class].item()
|
| 373 |
-
|
| 374 |
-
# Example
|
| 375 |
-
text = "GiαΊ£ng viΓͺn dαΊ‘y rαΊ₯t hay vΓ tΓ’m huyαΊΏt."
|
| 376 |
-
sentiment, confidence = predict_sentiment(text)
|
| 377 |
-
print(f"Sentiment: {sentiment}, Confidence: {confidence:.3f}")
|
| 378 |
-
```
|
| 379 |
-
|
| 380 |
-
## π Dataset Information
|
| 381 |
-
|
| 382 |
-
The UIT-VSFC corpus contains over 16,000 Vietnamese student feedback sentences with:
|
| 383 |
-
- **Sentiment Classification**: Positive, Neutral, Negative
|
| 384 |
-
- **Topic Classification**: Various educational topics
|
| 385 |
-
- **Inter-annotator agreement**: >91% for sentiment, >71% for topics
|
| 386 |
-
- **Original F1-score**: ~88% for sentiment (Maximum Entropy baseline)
|
| 387 |
-
|
| 388 |
-
## π§ Hardware Requirements
|
| 389 |
-
|
| 390 |
-
- **Minimum**: 8GB RAM, CPU
|
| 391 |
-
- **Recommended**: GPU with 8GB+ VRAM for faster training
|
| 392 |
-
- **Storage**: ~2GB for model and datasets
|
| 393 |
-
|
| 394 |
-
## π License
|
| 395 |
|
| 396 |
-
|
| 397 |
-
-
|
| 398 |
-
-
|
|
|
|
| 399 |
|
| 400 |
-
|
|
|
|
|
|
|
|
|
|
| 401 |
|
| 402 |
-
|
|
|
|
|
|
|
|
|
|
| 403 |
|
| 404 |
-
##
|
| 405 |
|
| 406 |
-
If you use this
|
| 407 |
|
| 408 |
```bibtex
|
| 409 |
@InProceedings{8573337,
|
|
@@ -418,8 +141,18 @@ If you use this work or the dataset, please cite:
|
|
| 418 |
}
|
| 419 |
```
|
| 420 |
|
| 421 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 422 |
|
| 423 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 424 |
|
| 425 |
-
**
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Vietnamese Sentiment Analysis
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
# π Vietnamese Sentiment Analysis
|
| 13 |
|
| 14 |
+
A Vietnamese sentiment analysis web interface built with Gradio and transformer models, optimized for Hugging Face Spaces deployment.
|
| 15 |
|
| 16 |
## π Features
|
| 17 |
|
| 18 |
+
- **π€ Transformer-based Model**: Uses 5CD-AI/Vietnamese-Sentiment-visobert from Hugging Face Hub
|
| 19 |
+
- **π Interactive Web Interface**: Real-time sentiment analysis via Gradio
|
| 20 |
+
- **β‘ Memory Efficient**: Built-in memory management and batch processing limits
|
| 21 |
+
- **π Visual Analysis**: Confidence scores with interactive charts
|
| 22 |
+
- **π Batch Processing**: Analyze multiple texts at once
|
| 23 |
+
- **π‘οΈ Memory Management**: Real-time memory monitoring and cleanup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## π― Usage
|
| 26 |
|
| 27 |
+
### Single Text Analysis
|
| 28 |
+
1. Enter Vietnamese text in the input field
|
| 29 |
+
2. Click "Analyze Sentiment"
|
| 30 |
+
3. View the sentiment prediction with confidence scores
|
| 31 |
+
4. See probability distribution in the chart
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
### Batch Analysis
|
| 34 |
+
1. Switch to "Batch Analysis" tab
|
| 35 |
+
2. Enter multiple Vietnamese texts (one per line)
|
| 36 |
+
3. Click "Analyze All" to process all texts
|
| 37 |
+
4. View comprehensive batch summary with sentiment distribution
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
### Memory Management
|
| 40 |
+
- Monitor real-time memory usage
|
| 41 |
+
- Use "Memory Cleanup" button if needed
|
| 42 |
+
- Automatic cleanup after each prediction
|
| 43 |
+
- Maximum 10 texts per batch for efficiency
|
|
|
|
| 44 |
|
| 45 |
## π Model Details
|
| 46 |
|
| 47 |
+
- **Model**: 5CD-AI/Vietnamese-Sentiment-visobert
|
| 48 |
+
- **Architecture**: Transformer-based (XLM-RoBERTa)
|
|
|
|
| 49 |
- **Language**: Vietnamese
|
| 50 |
+
- **Labels**: Negative, Neutral, Positive
|
| 51 |
- **Max Sequence Length**: 512 tokens
|
| 52 |
+
- **Device**: Automatic CUDA/CPU detection
|
| 53 |
|
| 54 |
+
## π‘ Example Usage
|
| 55 |
|
| 56 |
+
Try these example Vietnamese texts:
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
- "GiαΊ£ng viΓͺn dαΊ‘y rαΊ₯t hay vΓ tΓ’m huyαΊΏt." (Positive)
|
| 59 |
+
- "MΓ΄n hα»c nΓ y quΓ‘ khΓ³ vΓ nhΓ m chΓ‘n." (Negative)
|
| 60 |
+
- "Lα»p hα»c α»n Δα»nh, khΓ΄ng cΓ³ gΓ¬ ΔαΊ·c biα»t." (Neutral)
|
| 61 |
|
| 62 |
+
## π οΈ Technical Features
|
| 63 |
|
| 64 |
+
### Memory Optimization
|
| 65 |
+
- Automatic GPU cache clearing
|
|
|
|
| 66 |
- Garbage collection management
|
| 67 |
+
- Memory usage monitoring
|
| 68 |
+
- Batch size limits
|
| 69 |
+
- Real-time memory tracking
|
| 70 |
|
| 71 |
+
### Performance
|
| 72 |
+
- ~100ms processing time per text
|
| 73 |
+
- Supports up to 512 token sequences
|
| 74 |
+
- Efficient batch processing
|
| 75 |
+
- Memory limit: 8GB (Hugging Face Spaces)
|
| 76 |
|
| 77 |
+
## π Model Performance
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
The model provides:
|
| 80 |
+
- **Sentiment Classification**: Positive, Neutral, Negative
|
| 81 |
+
- **Confidence Scores**: Probability distribution across classes
|
| 82 |
+
- **Real-time Processing**: Fast inference on CPU/GPU
|
| 83 |
+
- **Batch Analysis**: Efficient processing of multiple texts
|
| 84 |
|
| 85 |
+
## π§ Deployment
|
| 86 |
+
|
| 87 |
+
This Space is configured for Hugging Face Spaces with:
|
| 88 |
+
- **SDK**: Gradio 4.44.0
|
| 89 |
+
- **Hardware**: CPU (with CUDA support if available)
|
| 90 |
+
- **Memory**: 8GB limit with optimization
|
| 91 |
+
- **Model Loading**: Direct from Hugging Face Hub
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
## π Requirements
|
| 94 |
|
| 95 |
See `requirements.txt` for complete dependency list:
|
| 96 |
+
- torch>=2.0.0
|
| 97 |
+
- transformers>=4.21.0
|
| 98 |
+
- gradio>=4.44.0
|
| 99 |
+
- pandas, numpy, scikit-learn
|
| 100 |
+
- psutil for memory monitoring
|
| 101 |
|
| 102 |
+
## π― Use Cases
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
- **Education**: Analyze student feedback
|
| 105 |
+
- **Customer Service**: Analyze customer reviews
|
| 106 |
+
- **Social Media**: Monitor sentiment in posts
|
| 107 |
+
- **Research**: Vietnamese text analysis
|
| 108 |
+
- **Business**: Customer sentiment tracking
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
+
## π Troubleshooting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
+
### Memory Issues
|
| 113 |
+
- Use "Memory Cleanup" button
|
| 114 |
+
- Reduce batch size
|
| 115 |
+
- Refresh the page if needed
|
| 116 |
|
| 117 |
+
### Model Loading
|
| 118 |
+
- Model loads automatically from Hugging Face Hub
|
| 119 |
+
- No local training required
|
| 120 |
+
- Automatic fallback to CPU if GPU unavailable
|
| 121 |
|
| 122 |
+
### Performance Tips
|
| 123 |
+
- Clear, grammatically correct Vietnamese text works best
|
| 124 |
+
- Longer texts (20-200 words) provide better context
|
| 125 |
+
- Use batch processing for multiple texts
|
| 126 |
|
| 127 |
+
## π Citation
|
| 128 |
|
| 129 |
+
If you use this model or Space, please cite the original model:
|
| 130 |
|
| 131 |
```bibtex
|
| 132 |
@InProceedings{8573337,
|
|
|
|
| 141 |
}
|
| 142 |
```
|
| 143 |
|
| 144 |
+
## π€ Contributing
|
| 145 |
+
|
| 146 |
+
Feel free to:
|
| 147 |
+
- Submit issues and feedback
|
| 148 |
+
- Suggest improvements
|
| 149 |
+
- Report bugs
|
| 150 |
+
- Request new features
|
| 151 |
|
| 152 |
+
## π License
|
| 153 |
+
|
| 154 |
+
This Space uses open-source components under MIT license.
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
|
| 158 |
+
**Try it now!** Enter some Vietnamese text above to see the sentiment analysis in action. π
|
requirements.txt
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core dependencies for Hugging Face Spaces
|
| 2 |
+
torch>=2.0.0
|
| 3 |
+
transformers>=4.21.0
|
| 4 |
+
datasets>=2.0.0
|
| 5 |
+
gradio>=4.44.0
|
| 6 |
+
|
| 7 |
+
# Data processing
|
| 8 |
+
pandas>=1.5.0
|
| 9 |
+
numpy>=1.21.0
|
| 10 |
+
scikit-learn>=1.1.0
|
| 11 |
+
|
| 12 |
+
# Visualization
|
| 13 |
+
matplotlib>=3.5.0
|
| 14 |
+
seaborn>=0.11.0
|
| 15 |
+
|
| 16 |
+
# Memory monitoring
|
| 17 |
+
psutil>=5.9.0
|
| 18 |
+
|
| 19 |
+
# System monitoring
|
| 20 |
+
accelerate>=0.21.0
|
| 21 |
+
safetensors>=0.3.1
|
| 22 |
+
|
| 23 |
+
# Additional dependencies
|
| 24 |
+
sentencepiece>=0.1.96
|
| 25 |
+
protobuf>=3.20.0
|
| 26 |
+
tokenizers>=0.13.3
|
| 27 |
+
huggingface-hub>=0.16.4
|