Mechanistic interpretability research is really gaining critical mass in New England - we had about 20 posters and lots of fun!
Overview
New England Mechanistic Interpretability (NEMI) is a 1 day workshop for academic and industry researchers working on mechanistic interpretability, held on August 19th in the Egan Research Center, 120 Forsyth St, Boston, MA, Room 240 Egan, Raytheon Amphitheater Northeastern University Boston, MA.
NEMI 2024 Schedule
Time Event
9:00am-10:00 Breakfast
10:00am-10:15 Welcome Remarks (Max Tegmark)
10:15am-10:20 Program Overview (Koyena Pal)
10:20am- 12:15 Morning Session
10:20-10:35 Opening Keynote Martin Wattenberg
10:35-10:50 The Platonic Representation Hypothesis: Brian Cheung
10:50-11:05 NNsight: A Transparent API for blackbox AI: Jaden Fiotto Kaufman
11:05-11:15 Coffee Break
11:15-11:30 Instruction Drift Kenneth Li
11:30-11:45 Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks: Hidenori Tanaka
11:45-12:15 Panel Session I: How mechanistic interpretability can help keep AI safe and beneficial, w. Angie Boggust, David Krueger & Dylan Hadfield-MenellMax Tegmark (Moderator)
12:15-2:00 Lunch / Presenters Round Tables
2:00pm-3:00 Poster Session
3:00pm-4:55 Afternoon Session
3:00-3:15 Opening Keynote: Sarah Schwettmann
3:15-3:30 Multilevel Interpretability of Artificial Neural Networks: Leveraging Framework and Methods from Neuroscience: Zhonghao He
3:30-3:45. Closing Keynote: Sam Marks
3:45-4:00 Group Photo
4:00-4:10 Coffee Break
4:10-4:25 Summative Talk: David Bau
4:25-4:55 Panel Session II: Mechanistic interpretability: state of play and promising directions, w. Martin Wattenberg, Byron Wallace, & Yonatan BelinkovDavid Bau (Moderator)
4:55-5:00 Closing Remarks (David Bau)
Posters
Xiaochen Li. Brown University Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Hadas Orgad Technion Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Sonja Johnson-Yu Harvard University Understanding biological active sensing behaviors by interpreting learned artificial agent policies
Oliver Daniels Umass Amherst Hypothesis Testing Edge Attribution Patching
Nikhil Prakash Northeastern University How do Language Models Bind Human Beliefs?
Kenneth Li. Harvard University Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Josh Engels. MIT Not All Language Model Features Are Linear
Shashata Sawmya and Linghao Kong MIT Neuronal Disentanglement and Sparse Expansion
Eric Todd Northeastern University Showing vs. Telling in LLMs
Sumedh Hindupur Harvard University Designing an interpretable neural network layer
Satpreet H Singh Harvard Medical School Emergent behaviour and neural dynamics in artificial agents tracking odour plumes
Xu Pan Harvard University Dissecting Query-Key Interaction in Vision Transformers
Arnab Sen Sharma Northeastern University Locating and Editing Factual Associations in Mamba
Yongyi Yang NTT Research Understanding the Concept Learning Dynamics
Binxu Wang Kempner Institute, Harvard University Raise one and infer three: Does generative models generalize on abstract rules for reasoning?
Eric Bigelow Harvard University In-Context Learning Dynamics as a Window Into the Mind of an LLM
David Baek MIT Generalization from Starvation: Representations in LLM Knowledge Graph Learning
Bhavya Vasudeva University of Southern California Towards a Control Theory of Language Models: Understanding When Instructions Override Priors
Shivam Raval Harvard University Sparse autoencoders find highly visualizable features in toy datasets
Core Francisco Park Harvard University / NTT Emergence of In-context learning beyond Bayesian retrieval: A mechanistic study
Tal Haklay Technion Automating position-aware circuit discovery
New England Mechanistic Interpretability Workshop Series
New England Mechanistic Interpretability (NEMI) is a workshop for academic and industry researchers working on mechanistic interpretability that live and work in New England.
Dates (all deadlines in AoE)
Registration submission deadline: August 2nd, 2024.
Poster/Talk Abstract submission deadline: August 9th, 2024.
Notification deadline: August 12th, 2024.
Event date: August 19, 2024
Location
Call for Submissions
We invite submissions for the NEMI 2024 workshop, a one-day event dedicated to exploring the latest developments in mechanistic interpretability research. We welcome submissions on all aspects of interpretability. Some of them will be selected for oral presentations and the remaining will be presented as posters. We encourage submissions from rising researchers who are enrolled in graduate programs at universities located in the New England region.
Submissions can be a work in progress or already published work. The overall length of the extended abstract should not exceed 500 words.
Senior Organizing Committee
David Bau, Northeastern University
Max Tegmark, MIT
Student Organizing Committee
Koyena Pal, Northeastern University
Kenneth Li, Harvard
Eric Michaud, MIT
Jannik Brinkmann, University of Mannheim
If you have any questions, please contact the organizing team at thebaulab@gmail.com.