Article
Emin Temiz PRO
etemiz
AI & ML interests
Alignment
Recent Activity
published an article 6 days ago
From Robots that Prey to Robots that Pray new activity 7 days ago
blog-explorers/README:accidental exit new activity 12 days ago
huihui-ai/Huihui-Qwen3.5-9B-abliterated:Ablirated Model CreationOrganizations
published an article 6 days ago
accidental exit
#14 opened 9 days ago
by
etemiz
Ablirated Model Creation
π₯ 1
7
#2 opened 15 days ago
by
vvv0000000
posted an update 17 days ago
Post
165
Quoted from https://www.nytimes.com/2026/03/10/opinion/ai-chatbots-virtue-vice.html :
"""
Consider a follow-up to an earlier version of the Nature paper. It explains in granular terms whatβs happening when the models snap to evil. It is math all the way down. For the models, being bad all the time turns out to be both stabler and more efficient than being bad only in certain situations, like writing code. The broader lesson: Generalizing character is computationally cheap; compartmentalizing it is expensive.
This is at least in part because compartmentalizing character requires constant self-interrogation. The model must constantly ask itself, βAm I supposed to be bad here? Good? Something in between?β Each of those checkpoints is another chance to get things wrong. This is interesting enough in A.I. Extrapolated to humans, the possibility becomes astonishing. Could it be that people get pulled into broad evil because itβs logically simpler and requires their brains to compute less?
"""
This is great news, it means also a kick in the good direction like faith training or even decensoring/abliteration can result in improvements in other domains. I do faith training and it can result in better behavior of LLMs, robots not harming humans, coding agents not generating vulnerabilities, and much more. Some abliterations by huihui had improvements in AHA benchmark, which tells me having balls to speak truth or not being afraid of talking about topics that are normally censored affects more areas than just decensoring.
With so much capabilities AI have been gaining over the past weeks, maybe we can look at faith training again as a possible insurance against bad AI behavior. What do you think?
"""
Consider a follow-up to an earlier version of the Nature paper. It explains in granular terms whatβs happening when the models snap to evil. It is math all the way down. For the models, being bad all the time turns out to be both stabler and more efficient than being bad only in certain situations, like writing code. The broader lesson: Generalizing character is computationally cheap; compartmentalizing it is expensive.
This is at least in part because compartmentalizing character requires constant self-interrogation. The model must constantly ask itself, βAm I supposed to be bad here? Good? Something in between?β Each of those checkpoints is another chance to get things wrong. This is interesting enough in A.I. Extrapolated to humans, the possibility becomes astonishing. Could it be that people get pulled into broad evil because itβs logically simpler and requires their brains to compute less?
"""
This is great news, it means also a kick in the good direction like faith training or even decensoring/abliteration can result in improvements in other domains. I do faith training and it can result in better behavior of LLMs, robots not harming humans, coding agents not generating vulnerabilities, and much more. Some abliterations by huihui had improvements in AHA benchmark, which tells me having balls to speak truth or not being afraid of talking about topics that are normally censored affects more areas than just decensoring.
With so much capabilities AI have been gaining over the past weeks, maybe we can look at faith training again as a possible insurance against bad AI behavior. What do you think?
reactedto danielhanchen's post with π₯ 23 days ago
Post
2956
You can now fine-tune Qwen3.5 for free with our notebook! π₯
You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!
Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/qwen3.5/fine-tune
Qwen3.5-4B Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb
You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!
Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/qwen3.5/fine-tune
Qwen3.5-4B Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb
https://huggingface.co/etemiz/Ostrich-27B-Qwen3.5-260305
1
#1977 opened 23 days ago
by
etemiz
posted an update 26 days ago
Post
291
Published a new checkpoint for Ostrich 32B
etemiz/Ostrich-32B-Qwen3-260303-GGUF
Started fine tuning Qwen 3.5 27B. Soon high density intelligence meets human alignment!
etemiz/Ostrich-32B-Qwen3-260303-GGUF
Started fine tuning Qwen 3.5 27B. Soon high density intelligence meets human alignment!