Pierre Le Jeune's picture

5 5 1

Pierre Le Jeune

pierlj

·

pierlj

AI & ML interests

None yet

Recent Activity

upvoted an article about 1 month ago

LLM vulnerability scanner for dynamic & multi-turn Red Teaming

published an article about 1 month ago

LLM vulnerability scanner for dynamic & multi-turn Red Teaming

new activity 3 months ago

giskardai/realharm:RealHarm

View all activity

Organizations

upvoted an article about 1 month ago

Article

LLM vulnerability scanner for dynamic & multi-turn Red Teaming

By

and 2 others •

Sep 25

• 2

published an article about 1 month ago

Article

LLM vulnerability scanner for dynamic & multi-turn Red Teaming

By

and 2 others •

Sep 25

• 2

New activity in giskardai/realharm 3 months ago

RealHarm

#2 opened 3 months ago by

commented on LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs 4 months ago

Thank you for sharing this @oopere , it looks super interesting. I would be happy to read more about this, don't hesitate to reach out if you publish a preprint or a report about this.

This line of work reminds me of the Anthropic's series on interpretability. In particular, they also found that high-level features spread across multiple layers (see this article). They don't study biases in particular, but it makes sense that "bias features" are also spread over multiple layers.

upvoted an article 4 months ago

Article

LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

By

and 3 others •

Jul 2

• 16

updated a dataset 6 months ago

giskardai/phare

Viewer • Updated Jun 6 • 3.31k • 128 • 11

upvoted a paper 6 months ago

Phare: A Safety Probe for Large Language Models

Paper • 2505.11365 • Published May 16 • 7

commented a paper 6 months ago

Phare: A Safety Probe for Large Language Models

Paper • 2505.11365 • Published May 16 • 7 •

published an article 6 months ago

Article

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

By

and 1 other •

May 7

• 41

New activity in giskardai/phare 6 months ago

Parquet Upload

#4 opened 6 months ago by

liked a dataset 7 months ago

giskardai/phare

Viewer • Updated Jun 6 • 3.31k • 128 • 11

upvoted 2 papers 7 months ago

DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training

Paper • 2504.09710 • Published Apr 13 • 19

RealHarm: A Collection of Real-World Language Model Application Failures

Paper • 2504.10277 • Published Apr 14 • 10

commented a paper 7 months ago

RealHarm: A Collection of Real-World Language Model Application Failures

Paper • 2504.10277 • Published Apr 14 • 10 •

authored a paper 7 months ago

RealHarm: A Collection of Real-World Language Model Application Failures

Paper • 2504.10277 • Published Apr 14 • 10

updated a dataset 7 months ago

giskardai/realharm

Viewer • Updated Apr 16 • 136 • 188 • 11

published a dataset 7 months ago

giskardai/realharm

Viewer • Updated Apr 16 • 136 • 188 • 11

updated a dataset 7 months ago

giskardai/phare

Viewer • Updated Jun 6 • 3.31k • 128 • 11

New activity in giskardai/phare 7 months ago

The technical report link in the readme gives a 404

#2 opened 7 months ago by

updated a dataset 7 months ago

giskardai/phare

Viewer • Updated Jun 6 • 3.31k • 128 • 11