Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
5
7
14
Catherine Arnett
catherinearnett
Follow
pkd's profile picture
MauroExtrac's profile picture
vinhnx90's profile picture
108 followers
·
37 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset
28 days ago
catherinearnett/bilingual-tokenizer-training-data
published
a dataset
29 days ago
catherinearnett/bilingual-tokenizer-training-data
liked
a dataset
about 1 month ago
commoncrawl/CommonLID
View all activity
Organizations
catherinearnett
's datasets
4
Sort: Recently updated
catherinearnett/bilingual-tokenizer-training-data
Viewer
•
Updated
28 days ago
•
30.7M
•
281
catherinearnett/montok
Updated
Sep 19, 2025
•
8.05k
•
3
catherinearnett/morphscore
Viewer
•
Updated
Jul 10, 2025
•
5.09M
•
406
•
4
catherinearnett/monolingual-tokenizer-data
Viewer
•
Updated
May 15, 2025
•
139M
•
231
•
1