This is something I typically hear in the context of modularity / sparsity (different names depending on whether you speak to the abstract people or the nitty gritty people).
And my impression was that they are a very big family of techniques with varied performance.
I tried my 1am best to find something that compares MOEs with other modularization tech, but nothing comes up. But here's a 2019 review on modularization techniques.
Some of which may perform better than the intuitive-sounding "partition the input space and use subnetworks that could be seen as different models for each sub-space.