Pure-mathematics background, working on AI safety.
Selected work
-
Does a grokked transformer compose group elements through their irreducible representations or through cosets? Two groups with identical character tables, built to tell the accounts apart, give a surprising answer: the clearest difference between them is not in the trained weights at all, but in how hard each group is to learn.