Sry. Lemme expand this a lil.
In the talk I heard Mixture of Experts explained as a thing that could reduce the compute requirements of large models by breaking them down into smaller models with sparse connections in between.
---
RT @lorenpmc
@cedar_xr @BasedBeffJezos Pointers to?
https://twitter.com/lorenpmc/status/1636598290727976960
I tried my 1am best to find something that compares MOEs with other modularization tech, but nothing comes up. But here's a 2019 review on modularization techniques.
I would love to know if I'm missing some major context on MOEs that distinguishes them from the other modularization techniques and make them especially deserving of discussion vs the rest.