Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6 • 37
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 14
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization Paper • 2508.04796 • Published Aug 6
From Citations to Criticality: Predicting Legal Decision Influence in the Multilingual Swiss Jurisprudence Paper • 2410.13460 • Published Oct 17, 2024
Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland Paper • 2410.13456 • Published Oct 17, 2024