Overparameterization and Stability: New Theory Explains Why Bigger Models Generalize Better in Classification
New research provides a theoretical bridge connecting model overparameterization, stability, and generalization for discontinuous classifiers, a class that includes modern neural networks. By introducing a quantifiable measure of class stability—defined as the expected distance to a decision boundary—researchers have derived a generalization bound that improves as stability increases. A key corollary is a "law of robustness" for classification, extending prior theory and demonstrating that any model that perfectly fits training data must be unstable unless it is substantially overparameterized, with the number of parameters p significantly exceeding the number of data points n.
Quantifying Stability as a Measure of Robustness
The core of the work, detailed in the preprint arXiv:2603.02806v1, moves beyond traditional smoothness assumptions to analyze discontinuous functions. The authors establish that for finite function classes, generalization performance can be bounded inversely by class stability. This stability is not a norm-based measure of parameter magnitude but a geometric property in the input domain: the expected margin or distance from data points to the classifier's decision boundary. This framing directly interprets stability as a quantifiable form of robustness to input perturbations.
This analysis leads to a significant theoretical extension. The researchers derive a classification analogue to the law of robustness proven by Bubeck and Sellke for regression. Their corollary states that for a model with approximately p ≈ n parameters to perfectly interpolate n training points, it must necessarily exhibit low stability. Therefore, achieving high stability—and by the bound, better generalization—demands substantial overparameterization where p >> n.
Extending Theory to Infinite Function Classes
The theoretical framework is further expanded to parameterized infinite function classes, which more closely resemble practical deep learning models. To handle this complexity, the authors analyze a stronger robustness measure derived from the margin in the output space, which they term normalized co-stability. This measure offers a tractable way to apply stability analysis to complex, high-dimensional models, reinforcing the central finding that overparameterization is a prerequisite for building stable, robust classifiers.
"The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers," the authors note, directly addressing the gap their work aims to fill. By moving from smooth to discontinuous functions, the theory becomes more directly applicable to the ReLU-based architectures that dominate contemporary AI.
Empirical Validation and Practical Implications
Theoretical claims are supported by experimental evidence. The study finds that as model size increases, measured stability consistently rises, and this increase strongly correlates with improved test performance. Crucially, the experiments show that traditional norm-based complexity measures, like weight magnitudes, remain largely uninformative in predicting generalization, highlighting the unique explanatory power of the new stability metric.
This research provides a compelling explanation for the empirical success of overparameterized models. It suggests that the drive for larger models is not merely about capacity to memorize, but about achieving the geometric property of stability in the input space, which in turn enforces robust generalization.
Why This Matters: Key Takeaways
- New Generalization Bound: A novel theoretical bound links better generalization directly to higher "class stability," a measurable form of robustness defined by decision boundaries.
- Law of Robustness for Classification: The work extends foundational theory, proving that interpolating models must be unstable unless they are highly overparameterized (p >> n).
- Beyond Smoothness: The theory applies to discontinuous classifiers (like neural networks), making it more relevant to modern AI than prior work requiring smooth functions.
- Empirical Confirmation: Experiments validate that stability increases with model size and correlates with test accuracy, while traditional norm-based measures fail to do so.
- Practical Design Insight: The findings provide a theoretical rationale for building large models, framing overparameterization as a pathway to inherent stability and robustness, not just memorization.