About the role
We are seeking a motivated intern with background in computer science, artificial intelligence, applied mathematics, computational linguistics, or a related field to leverage state-of-the-art reinforcement learning techniques for language model alignment.
In this role, you will have a freedom to explore multiple research directions: developing sophisticated reward models that capture human values by effectively decomposing complex value alignment into learnable components; creating synthetic training data through policy-guided rejection sampling; and implementing RL for alignment.
About the company
IBM Computers and information technology, A career in IBM Consulting embraces long-term relationships, meaningful impact, and close collaboration with clients across the globe.