Projects
Public research write-ups, presented in full on-site. Each is a self-contained deep dive with its working, figures, and the parts that did not work.
-
Does a grokked transformer compose group elements through their irreducible representations or through cosets? Two groups with identical character tables, built to tell the accounts apart, give a surprising answer: the clearest difference between them is not in the trained weights at all, but in how hard each group is to learn.