Once I believed that the paper Progress Measures for Grokking via Mechanistic Interpretability had found the network grokking onto the Schönhage-Strassen algorithm on the task of modular multiplication, not on modular addition around the circle
a Schelling point for those who seek one