Mechanistic interpretability research is really gaining critical mass in New England - we had about 20 posters and lots of fun!

Overview

New England Mechanistic Interpretability (NEMI) is a 1 day workshop for academic and industry researchers working on mechanistic interpretability, held on August 19th in the Egan Research Center, 120 Forsyth St, Boston, MA, Room 240 Egan, Raytheon Amphitheater Northeastern University Boston, MA.

NEMI 2024 Schedule

Time Event

9:00am-10:00 Breakfast

10:00am-10:15 Welcome Remarks (Max Tegmark)

10:15am-10:20 Program Overview (Koyena Pal)

10:20am- 12:15 Morning Session

10:20-10:35 Opening Keynote Martin Wattenberg

10:35-10:50 The Platonic Representation Hypothesis: Brian Cheung

10:50-11:05 NNsight: A Transparent API for blackbox AI: Jaden Fiotto Kaufman

11:05-11:15 Coffee Break

11:15-11:30 Instruction Drift Kenneth Li

11:30-11:45 Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks: Hidenori Tanaka

11:45-12:15 Panel Session I: How mechanistic interpretability can help keep AI safe and beneficial, w. Angie Boggust, David Krueger & Dylan Hadfield-MenellMax Tegmark (Moderator)

12:15-2:00 Lunch / Presenters Round Tables

2:00pm-3:00 Poster Session

3:00pm-4:55 Afternoon Session

3:00-3:15 Opening Keynote: Sarah Schwettmann

3:15-3:30 Multilevel Interpretability of Artificial Neural Networks: Leveraging Framework and Methods from Neuroscience: Zhonghao He

3:30-3:45. Closing Keynote: Sam Marks

3:45-4:00 Group Photo

4:00-4:10 Coffee Break

4:10-4:25 Summative Talk: David Bau

4:25-4:55 Panel Session II: Mechanistic interpretability: state of play and promising directions, w. Martin Wattenberg, Byron Wallace, & Yonatan BelinkovDavid Bau (Moderator)

4:55-5:00 Closing Remarks (David Bau)

Posters

Xiaochen Li. Brown University Preference Tuning For Toxicity Mitigation Generalizes Across Languages

Hadas Orgad Technion Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Sonja Johnson-Yu Harvard University Understanding biological active sensing behaviors by interpreting learned artificial agent policies

Oliver Daniels Umass Amherst Hypothesis Testing Edge Attribution Patching

Nikhil Prakash Northeastern University How do Language Models Bind Human Beliefs?

Kenneth Li. Harvard University Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

Josh Engels. MIT Not All Language Model Features Are Linear

Shashata Sawmya and Linghao Kong MIT Neuronal Disentanglement and Sparse Expansion

Eric Todd Northeastern University Showing vs. Telling in LLMs

Sumedh Hindupur Harvard University Designing an interpretable neural network layer

Satpreet H Singh Harvard Medical School Emergent behaviour and neural dynamics in artificial agents tracking odour plumes

Xu Pan Harvard University Dissecting Query-Key Interaction in Vision Transformers

Arnab Sen Sharma Northeastern University Locating and Editing Factual Associations in Mamba

Yongyi Yang NTT Research Understanding the Concept Learning Dynamics

Binxu Wang Kempner Institute, Harvard University Raise one and infer three: Does generative models generalize on abstract rules for reasoning?

Eric Bigelow Harvard University In-Context Learning Dynamics as a Window Into the Mind of an LLM

David Baek MIT Generalization from Starvation: Representations in LLM Knowledge Graph Learning

Bhavya Vasudeva University of Southern California Towards a Control Theory of Language Models: Understanding When Instructions Override Priors

Shivam Raval Harvard University Sparse autoencoders find highly visualizable features in toy datasets

Core Francisco Park Harvard University / NTT Emergence of In-context learning beyond Bayesian retrieval: A mechanistic study

Tal Haklay Technion Automating position-aware circuit discovery

New England Mechanistic Interpretability Workshop Series

New England Mechanistic Interpretability (NEMI) is a workshop for academic and industry researchers working on mechanistic interpretability that live and work in New England.

Dates (all deadlines in AoE)

Registration submission deadline: August 2nd, 2024.
Poster/Talk Abstract submission deadline: August 9th, 2024.
Notification deadline: August 12th, 2024.
Event date: August 19, 2024

Location

Venue: Northeastern University, Boston, MA 02115

Call for Submissions

We invite submissions for the NEMI 2024 workshop, a one-day event dedicated to exploring the latest developments in mechanistic interpretability research. We welcome submissions on all aspects of interpretability. Some of them will be selected for oral presentations and the remaining will be presented as posters. We encourage submissions from rising researchers who are enrolled in graduate programs at universities located in the New England region.

Submissions can be a work in progress or already published work. The overall length of the extended abstract should not exceed 500 words.

Senior Organizing Committee

David Bau, Northeastern University
Max Tegmark, MIT

Student Organizing Committee

Koyena Pal, Northeastern University
Kenneth Li, Harvard
Eric Michaud, MIT
Jannik Brinkmann, University of Mannheim

If you have any questions, please contact the organizing team at thebaulab@gmail.com.

New England Mechanistic Interpretability Workshop

Overview

NEMI 2024 Schedule

2024 ICML Mechanistic Interpretability Workshop

Berkeley Proof Scaling Workshop