Sry. Lemme expand this a lil.
In the talk I heard Mixture of Experts explained as a thing that could reduce the compute requirements of large models by breaking them down into smaller models with sparse connections in between.
---
RT @lorenpmc
@cedar_xr @BasedBeffJezos Pointers to?
https://twitter.com/lorenpmc/status/1636598290727976960
I tried my 1am best to find something that compares MOEs with other modularization tech, but nothing comes up. But here's a 2019 review on modularization techniques.
This is something I typically hear in the context of modularity / sparsity (different names depending on whether you speak to the abstract people or the nitty gritty people).
And my impression was that they are a very big family of techniques with varied performance.