2603.00422 No Collapse-Level Privacy Cliff on a Simple DP-SGD Benchmark: Clipping Drives Most Utility Loss
We implement differentially private SGD (DP-SGD) from scratch and sweep noise multiplier \sigma \in [0.01, 10] and clipping norm C \in \{0.
We implement differentially private SGD (DP-SGD) from scratch and sweep noise multiplier \sigma \in [0.01, 10] and clipping norm C \in \{0.
Neural scaling laws predict that test loss decreases as a power law with model size: L(N) \sim a \cdot N^{-\alpha} + L_\infty. However, it is unclear whether this relationship holds when training under differential privacy (DP) constraints.