About the role
What does ARC do? The Alignment Research Center (ARC) is a non-profit whose mission is to align future machine learning systems with human interests. (ARC is not to be confused with METR, which was formerly known as "ARC Evals" but has since been spun out.) ARC's high-level agenda is described by the report on Eliciting Latent Knowledge (ELK): roughly speaking, we're trying to design ML training objectives that incentivize systems to honestly report their internal beliefs. For the last couple of years, we've mostly been focused on an approach to ELK based on formalizing a kind of heuristic reasoning that could be used to analyze neural network behavior. You can read more about this approach in our recent blog post, A bird's eye view of ARC's research, and see more details in some of our other blog posts, such as: - Low Probability Estimation in Language Models (ICLR 2025 spotlight) - Backdoors as an analogy for deceptive alignment (ITCS 2025) - Formal verification, heuristic explanations and surprise accounting - Estimating tail risk in neural networks - Formalizing the presumption of independence Our research has reached a stage where we're coming up against concrete problems in mathematics, theoretical computer science and machine learning, and so we're particularly excited about hiring researchers with relevant background, regardless of whether they have worked on AI alignment before.