Back to All Events

2023 MIT Mechanistic Interpretability Conference

Saturday 6 May 2023 08:00
Sunday 7 May 2023 18:00

Massachusetts Institute if Technology 54 Vassar Street Cambridge, MA, 02139 United States (map)

Google Calendar ICS

Schedule

Saturday, May 6

0800: Breakfast
0900: Overview, LLM MI

Max Tegmark (MIT): Welcome (video)
Dylan Hadfield-Menell: interpretability overview (video)
Chris Olah: The SOTA of LLM Mechanistic Interpretability (video)

Coffee break

Mor Geva (Google Deepmind): interpreting LLM’s in embedding space (video)
David Bau (Northeastern): how LLMs remember facts (video)

Panel with Chris Olah, David Bau & Mor Geva, moderated by Neel Nanda: “What do & don’t we understand about LLMs?”
Lightning intros
1200: Lunch
1300: More LLM MI

Jacob Andreas/Evan Hernandez: How LLM’s model people’s beliefs (video)
Ekin Akyürek: How LLM’s can do linear regression at runtime (video)
Eric Michaud (MIT): Understanding LLM scaling in terms of computational quanta (video)
János Kramar: compiling any algorithm into a transformer (video)

1420: Group photo
1430: Poster Session
1530: MI beyond LLM

Tony Wang (MIT): how a human beat AlphaGo (video)
Ellie Pavlick (Brown): Neural network subroutines (video)
Ziming Liu (MIT): MI of knowledge representations, symmetry & modularity (video)
Sharon Li (Wisconsin): How unique are knowledge representations? (video)
Buck Schlegeris (Redwood): Formalism for thinking about MI (video)
Martin Wattenberg: Learned world models and what they’re good for (video)

Panel with Ila Fiete (MIT), Tommy Poggio (MIT), Gabriel Kreiman (Harvard): MI inspiration from neuroscience, physics & math (video)
1800-2100: Dinner Cruise, scintillating conversation

Sunday, May 7

0800: Breakfast
0900: Morning session: MI for AI safety

Panel with Viktoriya Krakovna (Google Deepmind/FLI), Connor Leahy (Conjecture), Sharon Li (Wisconsin), Anthony Aguirre (FLI): AGI Safety (video)
Neel Nanda (Google Deepmind): How MI can help AI safety (video)
Connor Leahy (Conjecture): MI for AGI safety (video)

Coffee break

Steve Omohundro: Provably safe AGI (video)
Silviu Marian Udrescu (MIT): Symbolic regression (video)
Marin Soljacic (MIT): Symbolic regression & applications

1200: Lunch
1300: Lightning talks (video)
1400-1800: Project incubation

Neel Nanda II: Whirlwind Tour of MI open problems (video)
Panel with Neel Nanda (Google Deepmind), Steve Omohundro & Martin Wattenberg (MIT), moderated by Chris Olah (Anthropic): promising MI research directions (video)
All group leaders looking for collaborators stand up & introduce themselves, lightning style

Coffee break
1515: Project incubator unconference, block I: Break out across different tables in atrium with one MI research direction per table. In parallel, Wes Gurnee & Neel Nanda MIT run MI tutorial hackathon in Singleton Auditorium for whoever wants to get their feet wet.
1615: Project incubator unconference, block II
1715: Report-back from breakouts, closing remarks (video)
1800: Conference dinner, mingling, scintillating conversation

Source: https://tegmark.org/mi2023/

2023 MIT Mechanistic Interpretability Conference

Schedule

Saturday, May 6

Sunday, May 7

2023 Buterin Fellows Symposium