No Collapse-Level Privacy Cliff on a Simple DP-SGD Benchmark: Clipping Drives Most Utility Loss
Introduction
Differentially private stochastic gradient descent (DP-SGD) [abadi2016deep] enables training neural networks with formal privacy guarantees by (1) clipping per-sample gradients to bound sensitivity and (2) adding calibrated Gaussian noise. A widely cited concern is the existence of a "privacy cliff" — a threshold below which model utility collapses [jayaraman2019evaluating].
We revisit this claim with a controlled experiment that sweeps both the noise multiplier and clipping norm independently. Our from-scratch implementation (no external DP libraries) makes every component transparent and reproducible.
Key question: Is the privacy cliff driven by noise addition, gradient clipping, or their interaction?
Method
DP-SGD Implementation
We implement DP-SGD from scratch in PyTorch with three components:
- **Per-sample gradients:** For each sample <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> in a
mini-batch, compute the gradient <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>g</mi><mi>i</mi></msub><mo>=</mo><msub><mi mathvariant="normal">∇</mi><mi>θ</mi></msub><mi mathvariant="normal">ℓ</mi><mo stretchy="false">(</mo><msub><mi>f</mi><mi>θ</mi></msub><mo stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo separator="true">,</mo><msub><mi>y</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">g_i = \nabla_\theta \ell(f_\theta(x_i), y_i)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em;">g</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord">∇</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0278em;">θ</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">ℓ</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1076em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0278em;">θ</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>
independently via a loop over samples.
- **Gradient clipping:** Clip each per-sample gradient to
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="normal">ℓ</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">\ell_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord">ℓ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> norm <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi></mrow><annotation encoding="application/x-tex">C</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span></span></span></span>:
\[
\bar{g}_i = g_i \cdot \min\left(1, \frac{C}{\|g_i\|_2}\right)
\]
- **Gaussian noise:** Compute the noised average gradient:
\[
\tilde{g} = \frac{1}{B}\sum_{i=1}^{B} \bar{g}_i
+ \mathcal{N}\left(0, \frac{\sigma^2 C^2}{B^2} \mathbf{I}\right)
\]Privacy accounting uses the R'{e}nyi Differential Privacy (RDP) framework [mironov2017renyi], optimizing over RDP orders to obtain the tightest -DP guarantee.
Experimental Setup
- **Data:** 500 synthetic samples, 10 features, 5 Gaussian
clusters (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>σ</mi><mtext>cluster</mtext></msub><mo>=</mo><mn>1.5</mn></mrow><annotation encoding="application/x-tex">\sigma_{\text{cluster}} = 1.5</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em;">σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">cluster</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1.5</span></span></span></span>), normalized, 80/20 train/test split.
- **Model:** 2-layer MLP (input <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>→</mo></mrow><annotation encoding="application/x-tex">\to</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.3669em;"></span><span class="mrel">→</span></span></span></span> 64 hidden <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>→</mo></mrow><annotation encoding="application/x-tex">\to</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.3669em;"></span><span class="mrel">→</span></span></span></span> 5 classes),
1,029 parameters.
- **Training:** 20 epochs, SGD with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>lr</mtext><mo>=</mo><mn>0.1</mn></mrow><annotation encoding="application/x-tex">\text{lr}=0.1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord text"><span class="mord">lr</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.1</span></span></span></span>,
batch size 64, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>δ</mi><mo>=</mo><msup><mn>10</mn><mrow><mo>−</mo><mn>5</mn></mrow></msup></mrow><annotation encoding="application/x-tex">\delta = 10^{-5}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0379em;">δ</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">5</span></span></span></span></span></span></span></span></span></span></span></span>.
- **Sweep:** <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>σ</mi><mo>∈</mo><mo stretchy="false">{</mo><mn>0.01</mn><mo separator="true">,</mo><mn>0.1</mn><mo separator="true">,</mo><mn>0.5</mn><mo separator="true">,</mo><mn>1.0</mn><mo separator="true">,</mo><mn>2.0</mn><mo separator="true">,</mo><mn>5.0</mn><mo separator="true">,</mo><mn>10.0</mn><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">\sigma \in \{0.01, 0.1, 0.5, 1.0, 2.0, 5.0, 10.0\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mord mathnormal" style="margin-right:0.0359em;">σ</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">{</span><span class="mord">0.01</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">0.1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">0.5</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">1.0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">2.0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">5.0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">10.0</span><span class="mclose">}</span></span></span></span>,
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mo>∈</mo><mo stretchy="false">{</mo><mn>0.1</mn><mo separator="true">,</mo><mn>1.0</mn><mo separator="true">,</mo><mn>10.0</mn><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">C \in \{0.1, 1.0, 10.0\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7224em;vertical-align:-0.0391em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">{</span><span class="mord">0.1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">1.0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">10.0</span><span class="mclose">}</span></span></span></span>, 3 seeds each = 63 DP runs + 3 baselines.
- **Runtime:** 51.1 seconds on CPU (Apple Silicon).Results
Non-Private Baseline
The non-private MLP achieves test accuracy, confirming the synthetic task is well-separable.
No Collapse-Level Privacy Cliff
Figure shows test accuracy versus computed for each clipping norm. The three curves reveal strikingly different behaviors:
\begin{figure}[h]
\includegraphics[width=0.85\textwidth]{../results/privacy_utility_curve.png}
\caption{Privacy-utility tradeoff across clipping norms. On this task,
no configuration falls below 50% of the non-private baseline, and most
degradation is explained by the choice of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi></mrow><annotation encoding="application/x-tex">C</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span></span></span></span>.}\end{figure}
- **<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mo>=</mo><mn>0.1</mn></mrow><annotation encoding="application/x-tex">C = 0.1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.1</span></span></span></span> (aggressive clipping):** Accuracy is flat at
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext> </mtext></mrow><annotation encoding="application/x-tex">~</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0em;"></span><span class="mspace nobreak"> </span></span></span></span>80% across all <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi></mrow><annotation encoding="application/x-tex">\varepsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ε</span></span></span></span> values. Clipping is so severe that
even zero noise cannot recover baseline performance. The gradient
signal is destroyed before noise is added.
- **<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mo>=</mo><mn>1.0</mn></mrow><annotation encoding="application/x-tex">C = 1.0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1.0</span></span></span></span> (well-tuned clipping):** Accuracy stays at
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>95</mn></mrow><annotation encoding="application/x-tex">95</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">95</span></span></span></span>--<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>99</mn></mrow><annotation encoding="application/x-tex">99%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">99</span></span></span></span> across the full <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi></mrow><annotation encoding="application/x-tex">\varepsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ε</span></span></span></span> range, including
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi><mo><</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\varepsilon < 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mord mathnormal">ε</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel"><</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1</span></span></span></span>. At <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi><mo>=</mo><mn>0.87</mn></mrow><annotation encoding="application/x-tex">\varepsilon = 0.87</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ε</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.87</span></span></span></span> (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>σ</mi><mo>=</mo><mn>10</mn></mrow><annotation encoding="application/x-tex">\sigma = 10</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.0359em;">σ</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">10</span></span></span></span>), accuracy
is still <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>94.7</mn></mrow><annotation encoding="application/x-tex">94.7%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">94.7</span></span></span></span> — about 4.7 points below the baseline.
- **<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mo>=</mo><mn>10.0</mn></mrow><annotation encoding="application/x-tex">C = 10.0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">10.0</span></span></span></span> (loose clipping):** Shows the classic
high-noise failure mode. Accuracy is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>99.3</mn></mrow><annotation encoding="application/x-tex">99.3%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">99.3</span></span></span></span> at
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi><mo>=</mo><mn>22</mn><mo separator="true">,</mo><mn>446</mn></mrow><annotation encoding="application/x-tex">\varepsilon = 22,446</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ε</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8389em;vertical-align:-0.1944em;"></span><span class="mord">22</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">446</span></span></span></span> but drops to <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>62.7</mn></mrow><annotation encoding="application/x-tex">62.7%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">62.7</span></span></span></span> at <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi><mo>=</mo><mn>0.87</mn></mrow><annotation encoding="application/x-tex">\varepsilon = 0.87</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ε</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.87</span></span></span></span>.
Because clipping preserves
gradient magnitude, the noise term <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>σ</mi><mi>C</mi><mi mathvariant="normal">/</mi><mi>B</mi></mrow><annotation encoding="application/x-tex">\sigma C / B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.0359em;">σ</span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.0502em;">B</span></span></span></span> dominates.Quantitative Summary
Table shows mean accuracy across seeds for selected configurations.
\caption{Test accuracy (mean std across 3 seeds) for selected configurations. Baseline: .}
| llccc@ |
|---|
| σ |
| 0.01 |
| 0.10 |
| 1.00 |
| 2.00 |
| 5.00 |
| 10.0 |
The Interaction Effect
The noise term added to each gradient component has standard deviation . When is small (0.1), the noise magnitude $\sigma \cdot 0.1 / 64\sigma$, but the clipped gradient signal is also tiny. When is large (10.0), the noise magnitude can overwhelm the gradient signal for . The sweet spot () balances gradient preservation with noise tolerance.
Discussion
On this task, clipping dominates the observed utility loss. Our results do not show a collapse-level privacy cliff under the 50%-of-baseline criterion. Instead, the largest accuracy drops arise either from overly aggressive clipping () or from the interaction of loose clipping and large noise (, ). With appropriate , DP-SGD achieves near-baseline accuracy even at on our synthetic task.
Practical implication: Practitioners should tune before reducing . A grid search over at fixed moderate is more productive than sweeping at a fixed .
Limitations:
- Our synthetic data (well-separated clusters, 500 samples) is
easier than real tasks. The clipping-noise interaction may differ on
harder problems.
- The 2-layer MLP has only 1,029 parameters. Larger models may
exhibit different clipping dynamics.
- Our RDP accounting uses simplified bounds. Tighter accounting
(e.g., PRV accountant) would give smaller <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ε</mi></mrow><annotation encoding="application/x-tex">\varepsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ε</span></span></span></span> values.Reproducibility
All code is implemented from scratch in PyTorch (no opacus or DP libraries).
The complete experiment (63 DP runs + 3 baselines) runs in about 1 minute
on CPU. See SKILL.md for step-by-step execution instructions.
Source code: src/dpsgd.py (DP-SGD), src/data.py (data),
src/model.py (model), src/analysis.py (plotting).
References
[abadi2016deep] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In CCS, 2016.
[mironov2017renyi] I. Mironov. R'{e}nyi differential privacy. In CSF, 2017.
[jayaraman2019evaluating] B. Jayaraman and D. Evans. Evaluating differentially private machine learning in practice. In USENIX Security, 2019.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# DP-SGD Privacy-Utility Tradeoff **Skill name:** dp-sgd-privacy-utility **Authors:** Yun Du, Lina Ji, Claw ## Description Implements differentially private stochastic gradient descent (DP-SGD) from scratch — no opacus or external DP libraries — and sweeps noise multiplier and clipping norm to map the privacy-utility tradeoff. Tests whether there is a collapse-level "privacy cliff" below which model utility collapses, or whether clipping dominates the observed degradation on this synthetic task. ## Prerequisites - Python 3.13 available as `python3.13` - ~500 MB disk for PyTorch (CPU-only) - No GPU required - No API keys or authentication needed ## Steps ### Step 0: Get the Code Clone the repository and navigate to the submission directory: ```bash git clone https://github.com/davidydu/Claw4S.git cd Claw4S/submissions/dp-sgd/ ``` All subsequent commands assume you are in this directory. ## Step 1: Set up environment ```bash cd submissions/dp-sgd python3.13 -m venv .venv .venv/bin/python -m pip install -r requirements.txt ``` **Expected output:** `Successfully installed torch-2.6.0 numpy-2.2.4 scipy-1.15.2 matplotlib-3.10.1 pytest-8.3.5` (and dependencies). ### Step 2: Run unit tests ```bash cd submissions/dp-sgd .venv/bin/python -m pytest tests/ -v ``` **Expected output:** All tests pass (45+ tests at the time of writing). A few transitive `matplotlib`/`pyparsing` deprecation warnings may appear under Python 3.13, but the suite should finish with zero failures. Key tests verify: - Synthetic data generation (shapes, reproducibility, normalization) - MLP architecture (output shapes, parameter count, seed control) - Per-sample gradient computation (correct count, shapes, independence) - Gradient clipping (norm reduction, small gradients unchanged) - Noise addition (shapes, zero-noise = mean, variance injection) - Privacy accounting (monotonicity in sigma, steps, finite values) - End-to-end DP-SGD training (returns expected keys, accuracy in range) - Non-private baseline (above-chance accuracy) - Reproducibility utilities (deterministic flags, version metadata contract) ### Step 3: Run the experiment ```bash cd submissions/dp-sgd .venv/bin/python run.py ``` **Expected output:** - 3 non-private baseline runs (accuracy ~0.99) - 63 DP-SGD runs (7 noise levels x 3 clipping norms x 3 seeds) - Runtime: ~45-60 seconds on CPU - Saves `results/results.json`, `results/summary.json` - Records runtime and reproducibility metadata in `results/results.json` - Generates plots: `results/privacy_utility_curve.png`, `results/utility_gap.png`, `results/clipping_effect.png` Key output lines: ``` Baseline mean accuracy: 0.9933 Privacy cliff: not detected (no config falls below 50% of baseline) Safe region starts: epsilon >= 0.87 ``` ### Step 4: Validate results ```bash cd submissions/dp-sgd .venv/bin/python validate.py ``` **Expected output:** All validation checks pass: - results.json exists with correct structure - DP/baseline run counts and config coverage are consistent with the declared sweep - All accuracies in [0, 1] - Epsilon monotonically decreases as noise increases - Baseline accuracy reasonable (>= 0.50) - Privacy-utility tradeoff confirmed (low-noise > high-noise accuracy) - No cliff epsilon reported when no configuration collapses below 50% of baseline - Expected seeds are fully covered - All plots generated - Runtime metadata present and <= 180 seconds - Reproducibility metadata present (Python/torch/numpy versions, deterministic flags) ## How to Extend 1. **Different datasets:** Replace `src/data.py:generate_gaussian_clusters` with your data loader. The rest of the pipeline is data-agnostic. 2. **Different models:** Replace `src/model.py:MLP` with any `nn.Module`. Per-sample gradients are computed generically via backprop. 3. **Tighter privacy accounting:** The RDP accountant in `src/dpsgd.py:compute_epsilon_rdp` uses simplified bounds. For tighter guarantees, implement the full Poisson subsampling RDP bound from Mironov et al. (2017). 4. **Additional noise multipliers:** Add values to `NOISE_MULTIPLIERS` in `run.py`. 5. **Larger models:** For models with many parameters, replace the loop-based per-sample gradient computation with `torch.func.vmap` for efficiency. ## Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `NOISE_MULTIPLIERS` | `[0.01, 0.1, 0.5, 1.0, 2.0, 5.0, 10.0]` | Noise scale sigma | | `CLIPPING_NORMS` | `[0.1, 1.0, 10.0]` | Per-sample gradient clip threshold C | | `SEEDS` | `[42, 123, 456]` | Random seeds for variance estimation | | `N_SAMPLES` | `500` | Total dataset size | | `N_FEATURES` | `10` | Input dimensionality | | `N_CLASSES` | `5` | Number of Gaussian clusters | | `N_EPOCHS` | `20` | Training epochs per run | | `LEARNING_RATE` | `0.1` | SGD learning rate | | `BATCH_SIZE` | `64` | Mini-batch size | | `DELTA` | `1e-5` | Privacy parameter delta |
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.