Transformer模型改进 Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48
finance MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Paper • 2410.02241 • Published Oct 3, 2024 • 8
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Paper • 2410.02241 • Published Oct 3, 2024 • 8
Transformer模型改进 Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48
finance MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Paper • 2410.02241 • Published Oct 3, 2024 • 8
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Paper • 2410.02241 • Published Oct 3, 2024 • 8