Instructions to use LaBackDoor/trafficgpt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LaBackDoor/trafficgpt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LaBackDoor/trafficgpt")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LaBackDoor/trafficgpt", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LaBackDoor/trafficgpt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LaBackDoor/trafficgpt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LaBackDoor/trafficgpt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LaBackDoor/trafficgpt
- SGLang
How to use LaBackDoor/trafficgpt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LaBackDoor/trafficgpt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LaBackDoor/trafficgpt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LaBackDoor/trafficgpt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LaBackDoor/trafficgpt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LaBackDoor/trafficgpt with Docker Model Runner:
docker model run hf.co/LaBackDoor/trafficgpt
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic Analysis and Generation
TrafficGPT is a deep-learning foundation model designed to tackle complex challenges in network traffic analysis and generation. By leveraging generative pre-training with a linear attention mechanism, it expands the effective token window from the traditional 512-token limit to 12,032 tokens.
Model Details
- Developed by: Jian Qu, Xiaobo Ma, and Jianfeng Li (Xi'an Jiaotong University).
- Model Type: Generative Pre-trained Transformer with Linear Attention.
- Architecture: 24 layers, 12 attention heads, hidden dimension of 512.
- Key Innovations: - Reversible Tokenization: Bijective mapping between PCAP files and token lists for direct traffic reconstruction.
- Linear Complexity: Reduces self-attention complexity from $O(N^2)$ to $O(N)$.
- Reversible Network: Optimized memory usage based on the Reformer architecture.
Intended Use
- Traffic Classification: High-accuracy identification of encrypted flows, VPN traffic, and IoT device communications.
- Traffic Generation: Creating realistic, protocol-compliant PCAP files for network simulation and security testing.
- Protocol Reverse Engineering: Learning robust representations of unknown or complex network protocols.
Training Data
The model was pre-trained on 189 GB of raw network traffic across five major datasets:
- ISCX-Tor2016: Tor network traffic characterization.
- USTCTFC2016: Malware and software identification traffic.
- ISCXVPN2016: Encrypted VPN vs. non-VPN flows.
- DoHBrw2020: DNS-over-HTTPS tunnel detection.
- CICIoT2022: Multidimensional IoT profiling data.
Training Details & Hyperparameters
While the original TrafficGPT research utilized a 99:1 train-test split (99% for pre-training, 1% for testing), this open-source version employs a standard 80:20 split.
Evaluation Results
Classification Performance (Macro F1-Score)
TrafficGPT(12k) consistently outperforms existing state-of-the-art models[cite: 16, 281].
| Dataset | Metric | TrafficGPT (12k) |
|---|---|---|
| ISCX-VPN-App | Macro F1 | 1.0000 |
| USTC-TFC | Macro F1 | 0.9877 |
| Cross-Platform (iOS) | Macro F1 | 0.9863 |
| Cross-Platform (Android) | Macro F1 | 0.9498 |
Generation Quality
Measured using Jensen-Shannon Divergence (JSD), where lower values indicate closer similarity to real traffic.
- Packet Header JSD (Avg): 0.1605.
- Flow Feature JSD (Avg): 0.2396.
- Discriminator Realism: F1-Score of 0.6683.
Limitations
- Networking Interpretation: Because these fields and checksums have been removed, the model does not learn IP/Port associations. While this ensures the model learns protocol features rather than metadata, it limits the model's utility in scenarios where port-protocol mapping is vital for networking interpretation
- Protocol Anomalies: May occasionally generate malformed packets in complex encrypted protocols (e.g., TLS Client Hello).
- Inter-flow Correlation: Currently focuses on individual TCP/UDP flows and does not yet capture complex correlations between multiple distinct flows.
- Computational Cost: While linear in complexity, training on 3k tokens still requires significant memory and step-time optimization.
Citation
If you use TrafficGPT in your research, please cite:
@article{qu2024trafficgpt,
title={TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic Analysis and Generation},
author={Qu, Jian and Ma, Xiaobo and Li, Jianfeng},
journal={arXiv preprint arXiv:2403.05822},
year={2024}
}
Paper for LaBackDoor/trafficgpt
Evaluation results
- Macro F1 on ISCX-VPN-Appself-reported1.000
- Macro F1 on USTC-TFCself-reported0.988