RT @DanHendrycks
To find the limits of Transformers, we collected 12,500 math problems. While a three-time IMO gold medalist got 90%, GPT-3 models got ~5%, with accuracy increasing slowly.
If trends continue, ML models are far from achieving mathematical reasoning.