I’m a 5th year Artificial Intelligence PhD student at Berkeley advised by Anca Dragan and Stuart Russell within BAIR and CHAI. I’m thankful to be supported by the NSF Fellowship.
I’m interested in humans’ preference, value, and belief changes, and how they may be affected by interactions with AI systems. I’ve studied this both in generality (with the language of DR-MDPs), and more specifically in the context of recommender systems, investigating how the choice of algorithm might affect us users. I’m probably best known for my work on human-AI collaboration, and developing the Overcooked-AI benchmark.
Before immigrating to the US, I grew up in the amazingly chaotic city of Livorno 🇮🇹 - visit if you get the chance!
Publications
Also see my Google Scholar (may be more up to date).-
Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Marcus Williams*, Micah Carroll*, Adhyyan Narang, Constantin Weisser, Brendan Murphy, Anca Dragan. In submission.
[thread] [code]
-
Beyond Preferences in AI Alignment
Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton. Philosophical Studies.
-
AI Alignment with Changing and Influenceable Reward Functions
Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan. ICML 2024.
-
Characterizing Manipulation from AI Systems
Micah Carroll*, Alan Chan*, Henry Ashton, David Krueger. EAAMO 2023.
-
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper, Xander Davies, ..., Micah Carroll, ..., Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell.
TMLR 2023.
-
Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media
Smitha Milli, Micah Carroll, Sashrika Pandey, Yike Wang, Anca Dragan.
Knight Institute Symposium: Optimizing for What? 2023.
-
Harms from Increasingly Agentic Algorithmic Systems
Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj.
FAccT 2023.
-
Who Needs to Know? Minimal Knowledge for Optimal Coordination
Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael Dennis, Stuart Russell.
ICML 2023.
-
Uni[MASK]: Unified Inference in Sequential Decision Problems
Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin.
NeurIPS 2022 (Oral).
[code]
-
Estimating and Penalizing Induced Preference Shifts in Recommender Systems.
Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan.
ICML 2022 (previous versions at Recsys 2021 LBR Track, and Recsys 2021 FAccTRec Workshop as long talk).
-
Optimal Behavior Prior: Improving Human-AI Collaboration Through Generalizable Human Models.
Mesut Yang, Micah Carroll, Anca Dragan.
Human-in-the-loop Learning (HILL) Workshop, NeurIPS 2022.
[environment]
-
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
David Zhang, Micah Carroll, Andreea Bobu, Anca Dragan.
Human-in-the-loop Learning (HILL) Workshop, NeurIPS 2022.
-
Evaluating the Robustness of Collaborative Agents.
Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, Anca Dragan, Rohin Shah.
AAMAS 2021.
[environment]
-
On the Utility of Learning about Humans for Human-AI Coordination.
Micah Carroll, Rohin Shah, Mark Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan.
NeurIPS 2019.
[video] [environment] [demo] [blogpost]