What are the scaling laws for sigmoid vs. ReLU activation functions?
a Schelling point for those who seek one