Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Catherine Arnett's picture
5 6 9

Catherine Arnett

catherinearnett
thermal666's profile picture mrajbrahma's profile picture tylerachang's profile picture
ยท
https://catherinearnett.github.io/
  • linguist_cat
  • catherinearnett
  • catherinearnett.bsky.social

AI & ML interests

multilingual NLP, tokenization

Recent Activity

authored a paper 6 days ago
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
authored a paper 6 days ago
Explaining and Mitigating Crosslingual Tokenizer Inequities
authored a paper 6 days ago
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
View all activity

Organizations

Blog-explorers's profile picture Language and Cognition Lab (UCSD)'s profile picture

catherinearnett 's datasets 2

catherinearnett/montok

Updated Sep 19 โ€ข 1.74k

catherinearnett/morphscore

Viewer โ€ข Updated Jul 10 โ€ข 5.09M โ€ข 237 โ€ข 3
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs