**Cedar** @Cedar@schelling.pt · 2023-03-17T05:47:06Z

Cedar @Cedar@schelling.pt

Sry. Lemme expand this a lil.

In the talk I heard Mixture of Experts explained as a thing that could reduce the compute requirements of large models by breaking them down into smaller models with sparse connections in between.
---
RT @lorenpmc
@cedar_xr @BasedBeffJezos Pointers to?
https://twitter.com/lorenpmc/status/1636598290727976960

Mar 17, 2023, 05:47 · · Moa · · ·

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

Mar 17, 2023, 05:47

Cedar @Cedar@schelling.pt

This is something I typically hear in the context of modularity / sparsity (different names depending on whether you speak to the abstract people or the nitty gritty people).

And my impression was that they are a very big family of techniques with varied performance.

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

Mar 17, 2023, 05:47

Cedar @Cedar@schelling.pt

Some of which may perform better than the intuitive-sounding "partition the input space and use subnetworks that could be seen as different models for each sub-space.

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

Mar 17, 2023, 05:47

Cedar @Cedar@schelling.pt

I tried my 1am best to find something that compares MOEs with other modularization tech, but nothing comes up. But here's a 2019 review on modularization techniques.

https://arxiv.org/abs/1904.12770

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

**Cedar** @Cedar@schelling.pt · Mar 17, 2023, 05:47

Mar 17, 2023, 05:47

Cedar @Cedar@schelling.pt

I would love to know if I'm missing some major context on MOEs that distinguishes them from the other modularization techniques and make them especially deserving of discussion vs the rest.

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…