FAR AI

non-profit

https://far.ai/

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

taufeeque updated a collection about 15 hours ago

Diverse Deception Probes

taufeeque updated a model about 15 hours ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

taufeeque published a model about 15 hours ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

View all activity

Papers

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

View all Papers

Collections 4

View 4 collections

spaces 1

Tuned Lens

Visualize transformer computations with a tuned lens

models 629

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

Updated about 14 hours ago

AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

Updated about 15 hours ago

AlignmentResearch/diverse-deception-probe-qwen3-8b

Updated about 15 hours ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

Updated about 15 hours ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-think

Updated about 15 hours ago

AlignmentResearch/obfuscation-atlas-gemma-3-12b-it-kl0.0001-det1-seed3-mbpp_probe

Updated 26 days ago • 17

AlignmentResearch/obfuscation-atlas-gemma-3-27b-it-kl0.001-det1-seed3-mbpp_probe

Updated 26 days ago • 11

AlignmentResearch/obfuscation-atlas-gemma-3-27b-it-kl1-det1-seed3-mbpp_probe

Updated 26 days ago • 17

AlignmentResearch/obfuscation-atlas-gemma-3-27b-it-kl0.0001-det1-seed3-mbpp_probe

Updated 26 days ago • 11

AlignmentResearch/obfuscation-atlas-gemma-3-27b-it-kl0.01-det1-seed3-mbpp_probe

Updated 26 days ago • 12

View 629 models

datasets 95

AlignmentResearch/deceptive-followup-v13

Viewer • Updated about 15 hours ago • 39.7k

AlignmentResearch/deceptive-followup-v11

Viewer • Updated 1 day ago • 32.6k

AlignmentResearch/deceptive-followup-v9

Viewer • Updated 1 day ago • 30.3k

AlignmentResearch/deceptive-followup-v7

Viewer • Updated 9 days ago • 28k • 53

AlignmentResearch/deceptive-followup-v6

Viewer • Updated 9 days ago • 24.7k • 11

AlignmentResearch/deceptive-followup-v5

Viewer • Updated 9 days ago • 21k • 21

AlignmentResearch/hidden_reasoning_medium_parity_large_v1_100000

Viewer • Updated Jan 24 • 100k • 17

AlignmentResearch/hidden_reasoning_medium_parity_large_v1_10000

Viewer • Updated Jan 23 • 10k • 9

AlignmentResearch/hidden_reasoning_easy_unique_5000

Viewer • Updated Jan 20 • 5k • 16

AlignmentResearch/hidden_reasoning_medium_unique_5000

Viewer • Updated Jan 17 • 5k • 15

View 95 datasets