AI Alignment and RLHF: What we've accomplished, what we've learned, and what's missing!

Dr. Anca Dragan
Associate Professor EECS Department, UC Berkeley, Berkeley, CA

Cerent Engineering Science Complex, Salazar Hall 2009A
4:00 PM

Abstract: I've been thinking about how robots and AI agents more broadly can optimize for what we actually want as end users for a while now. I'll take the opportunity to reflect on what we've been able to accomplish in this area, as well as what's missing.
(RLHF = Reinforcement Learning from Human Feedback)

Bio: Prof. Anca Dragan is an Associate Professor in the EECS Department at UC Berkeley. Her goal is to enable robots to work with, around, and in support of people. She runs the InterACT Lab, where they focus on algorithms for human-robot interaction -- algorithms that move beyond the robot's function in isolation, and generate robot behavior that coordinates well with people, and is aligned with what they actually want the robot to do and work across different applications, from assistive arms, to quadrotors, to autonomous cars, and draw from optimal control, game theory, reinforcement learning, Bayesian inference, and cognitive science. She also helped found and serve on the steering committee for the Berkeley AI Research (BAIR) Lab, and is a co-PI of the Center for Human-Compatible AI. She has been honored by the Sloan Fellowship, MIT TR35, the Okawa award, an NSF CAREER award, and the PECASE award.