Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
5
6
9
Catherine Arnett
catherinearnett
Follow
thermal666's profile picture
mrajbrahma's profile picture
tylerachang's profile picture
95 followers
ยท
31 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
authored
a paper
6 days ago
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
authored
a paper
6 days ago
Explaining and Mitigating Crosslingual Tokenizer Inequities
authored
a paper
6 days ago
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
View all activity
Organizations
catherinearnett
's datasets
2
Sort:ย Recently updated
catherinearnett/montok
Updated
Sep 19
โข
1.74k
catherinearnett/morphscore
Viewer
โข
Updated
Jul 10
โข
5.09M
โข
237
โข
3