Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Paper
•
2502.11191
•
Published
•
8
Llama-Primus-Nemotron-70B 🏄 is obtained by continued pretraining Llama-3.1-Nemotron-70B-Instruct on over 10B tokens of cybersecurity texts.
Note Llama-Primus-Nemotron-70B-Instruct🏌️achieves an 18.18% improvement in aggregate scores across several public cybersecurity benchmarks while maintaining the same general instruction-following capabilities (Arena Hard).
Note Includes 7.63B tokens of cybersecurity texts filtered from Nemotron-CC.