The Price of Robustness: Stable Classifiers Need Overparameterization

A theoretical study establishes that discontinuous classifiers require substantial overparameterization to achieve robustness. The research introduces 'class stability' as a quantifiable measure and extends the law of robustness to non-smooth functions, proving that models with approximately equal parameters to data points (p≈n) must be unstable. The findings provide mathematical justification for large model sizes in modern machine learning, with empirical validation showing stability increases with model size.

The Price of Robustness: Stable Classifiers Need Overparameterization

Overparameterization and Robustness: New Theory Links Model Size to Stability in Discontinuous Classifiers

A new theoretical study provides a crucial bridge in understanding how overparameterization relates to stability and generalization, specifically for the challenging case of discontinuous classifiers. Published on arXiv (2603.02806v1), the research establishes a novel generalization bound that improves as a model's class stability—defined as the expected distance to the decision boundary—increases. A key corollary extends the so-called "law of robustness" beyond smooth functions, demonstrating that any model perfectly fitting (interpolating) *n* data points with roughly *n* parameters must be unstable, thereby proving that substantial overparameterization is a prerequisite for achieving high robustness.

Quantifying Robustness Through Class Stability

The core of the work introduces class stability as a quantifiable and theoretically tractable measure of a model's robustness in the input domain. For finite function classes, the authors derive a generalization bound where the error scales inversely with this stability measure, formally linking greater expected margin to better generalization performance. This provides a fresh analytical lens, moving beyond traditional norm-based complexity measures which the paper finds to be largely uninformative for this problem class.

To extend the theory to parameterized infinite function classes, the analysis introduces a stronger, related measure termed normalized co-stability. This concept is derived from the margin observed in the model's output space (codomain). The analysis of this measure yields analogous results, reinforcing the fundamental trade-offs between parameter count, interpolation, and robustness.

Extending the Law of Robustness to Non-Smooth Functions

A significant theoretical advancement presented is the extension of the law of robustness, initially formulated for smooth functions by Bubeck and Sellke, to encompass discontinuous functions. The corollary rigorously shows that in an interpolating regime where the number of parameters *p* is approximately equal to the number of data points *n*, high stability is impossible. This mathematically confirms that achieving robust, stable decision boundaries requires moving into a significantly overparameterized regime where *p >> n*, offering a theoretical justification for the large model sizes common in modern machine learning.

Empirical Validation and Practical Implications

The theoretical findings are supported by experimental evidence. The research demonstrates that in practice, model stability consistently increases with model size (overparameterization). Furthermore, this measured stability shows a clear positive correlation with improved test performance, validating it as a meaningful predictor of generalization. Conversely, experiments confirm that traditional norm-based capacity measures fail to capture this relationship, highlighting the unique explanatory power of the new stability-based framework.

Why This Matters: Key Takeaways

  • New Generalization Theory: Provides a stability-dependent generalization bound for discontinuous classifiers, filling a notable gap in learning theory.
  • Law of Robustness Extended: Proves that significant overparameterization is mathematically necessary for achieving stable, robust classifiers, even when functions are not smooth.
  • Practical Metric: Introduces class stability and normalized co-stability as empirically validated measures that correlate with test accuracy, unlike traditional norm-based metrics.
  • Justifies Large Models: Offers a theoretical foundation for the observed benefit of very large parameter counts, linking them directly to improved robustness and generalization.

常见问题