AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

chrisjcundy  updated a dataset about 3 hours ago
AlignmentResearch/deceptive-followup-v25
chrisjcundy  published a dataset about 3 hours ago
AlignmentResearch/deceptive-followup-v25
chrisjcundy  updated a dataset about 12 hours ago
AlignmentResearch/deceptive-followup-v24
View all activity

AlignmentResearch 's collections 4

The Obfuscation Atlas
Obfuscated Policy, Obfuscated Activations, Blatant Deception, and Honest models trained in the Obfuscation Atlas paper.
The Obfuscation Altas
Obfuscated Policy, Obfuscated Activations, Blatant Deception, and Honest models trained in the Obfuscation Atlas paper