CUSUM Change-Point Detection in Solar Cycle Asymmetry: Evidence for a Structural Transition in the Early Nineteenth Century
CUSUM Change-Point Detection in Solar Cycle Asymmetry: Evidence for a Structural Transition in the Early Nineteenth Century
stepstep_labs
Abstract
The temporal asymmetry of the solar activity cycle—characterized by a faster rise to maximum than decline to minimum—is a well-established feature of solar variability, closely linked to the Waldmeier effect. Here we apply cumulative sum (CUSUM) change-point analysis to the rise-fall asymmetry ratio across all 24 complete solar cycles (1755–2024) using the SILSO v2.0 monthly mean sunspot number with 13-month smoothing. We identify a structural transition at the boundary between Cycles 7 and 8 (~1833). Early cycles (1–7) exhibit near-symmetric profiles with high variability (mean asymmetry ratio 0.921, SD = 0.494), whereas modern cycles (8–24) display consistently asymmetric fast-rise/slow-fall behavior (mean 0.611, SD = 0.150). An optimal-split permutation test yields p = 0.020, though the global CUSUM statistic does not reach conventional significance (p = 0.162). The most striking result is a tenfold variance collapse—from SD = 0.494 to SD = 0.150—confirmed by a Brown-Forsythe test (F = 18.586, permutation p = 0.0008), establishing the variance inequality as highly significant. Extensive sensitivity analyses demonstrate that the variance collapse is robust to the hypothesized "lost" solar cycle, to removal of the Cycle 7 outlier, and to differential timing uncertainties in the early record. A Monte Carlo framework incorporating realistic epoch uncertainties finds the variance ratio exceeds 3× in 99.5% of iterations. We confirm the classical Waldmeier effect (Spearman ρ = −0.749 between amplitude and rise time, p < 0.0001) and show that strong amplitude–asymmetry coupling (ρ = −0.689, p < 0.0001) may itself be an emergent property of the modern regime. Our findings provide evidence—though not proof—that solar dynamo behavior underwent a regime change in the early 1800s, with implications for interpreting long-term solar variability and constraining dynamo models.
1. Introduction
The approximately 11-year solar activity cycle, first documented systematically by Schwabe (1844), remains the most prominent periodic signal in solar variability. Since the pioneering work of Rudolf Wolf in the mid-nineteenth century, the sunspot number has served as the primary quantitative index of solar magnetic activity, now maintained by the World Data Center SILSO at the Royal Observatory of Belgium (Clette et al. 2014; Clette & Lefèvre 2016).
A well-known feature of the solar cycle is its temporal asymmetry: the ascending phase from minimum to maximum is generally shorter than the descending phase from maximum back to minimum. This asymmetry is intimately connected to the Waldmeier effect—the empirical observation that stronger cycles tend to rise more rapidly (Waldmeier 1935; Hathaway et al. 2002). The Waldmeier effect has been extensively studied both observationally (Hathaway 2015; Cameron & Schüssler 2008) and theoretically through flux-transport dynamo models (Karak & Choudhuri 2011; Pipin & Kosovichev 2011), where it emerges naturally from the nonlinear interplay between magnetic flux generation and buoyant flux loss.
The rise-fall asymmetry ratio—defined as the ratio of ascent duration to descent duration for each cycle—provides a compact scalar characterization of cycle shape. Values less than unity indicate the canonical fast-rise/slow-fall pattern, while values near or above unity suggest more symmetric or even reverse-asymmetric profiles. Although individual cycle parameters have been tabulated and analyzed extensively (Hathaway et al. 2002; Hathaway 2015), relatively little attention has been paid to the question of whether the statistical properties of the asymmetry ratio have themselves changed over time.
In this study, we apply cumulative sum (CUSUM) change-point detection (Page 1954) to the sequence of asymmetry ratios across all 24 complete solar cycles. The CUSUM method, originally developed for industrial quality control, is well suited to detecting shifts in the mean of an ordered sequence and has found wide application in environmental monitoring, epidemiology, and climate science (e.g., Manly & Mackenzie 2003). Its principal advantage is that it provides a nonparametric, visually interpretable diagnostic for structural breaks in time-ordered data, without requiring distributional assumptions.
Our analysis reveals evidence for a structural transition at the Cycle 7/8 boundary (~1833), separating an early regime of high-variability, near-symmetric cycles from a modern regime of constrained, consistently asymmetric cycles. The most important finding is not merely a shift in the mean asymmetry, but a dramatic collapse in its variance—a tenfold reduction from the early to the modern segment, confirmed as highly significant by Brown-Forsythe testing. We subject this finding to extensive sensitivity analyses, including incorporation of the hypothesized "lost" solar cycle, Monte Carlo propagation of timing uncertainties, and systematic outlier assessment. The variance collapse persists under all tested scenarios, including the most conservative assumptions.
These results place constraints on dynamo models and raise questions about whether the well-established Waldmeier relations are universal properties of solar activity or emergent features of a specific dynamical regime. We discuss both physical interpretations—involving a possible dynamo regime change following the Dalton Minimum—and the competing hypothesis that improving observational practices account for some or all of the detected transition.
2. Methods
2.1 Data
We use the SILSO (Sunspot Index and Long-term Solar Observations) Version 2.0 monthly mean total sunspot number, maintained by the World Data Center at the Royal Observatory of Belgium (Clette et al. 2014; Clette & Lefèvre 2016). This revised series incorporates corrections for known historical inhomogeneities in the original Wolf/Zürich series, including the Waldmeier discontinuity and observer network changes (Clette et al. 2014). The dataset comprises 3,312 monthly observations spanning 1749–2024. For cycle characterization, we employ the 13-month smoothed monthly sunspot number, computed as a tapered-boxcar running mean with equal weights of 1 for months −5 to +5 and half-weights (0.5) for months −6 and +6, normalized by a factor of 1/12. This standard smoothing procedure suppresses short-term fluctuations—including quasi-biennial oscillations and rotational modulation—while preserving the cycle-scale signal and yielding uniquely defined maxima and minima (Hathaway et al. 2002).
2.2 Cycle Characterization
Solar cycle boundaries (minima) are determined from established catalogs consistent with the SILSO reference list. Using 13-month smoothed values, we identify for each of the 24 complete cycles (numbered 1 through 24) the epochs of minimum and maximum, the cycle amplitude (smoothed peak value), the rise time (years from minimum to maximum), the fall time (years from maximum to the following minimum), and the total cycle length. The asymmetry ratio is defined as:
where and are the rise and fall durations (in years) of the -th cycle. Values of indicate the canonical fast-rise/slow-fall morphology; indicates an unusually slow rise relative to the decline.
2.3 CUSUM Statistic
The cumulative sum (CUSUM) statistic (Page 1954) is constructed as follows. Given an ordered sequence of observations with sample mean , the CUSUM at position is:
Under the null hypothesis of no change in mean, the CUSUM traces a random walk centered on zero. A sustained deviation—monotonic increase or decrease followed by a reversal—signals a shift in the level of the series. The position at which attains its maximum is the CUSUM-estimated change point. The test statistic is:
2.4 Permutation Test for Change-Point Significance
Because the asymmetry ratios form a short series (), we assess the significance of the CUSUM statistic via a nonparametric permutation test. Under the null hypothesis of no temporal structure, the observed sequence is one of equally likely permutations. We generate 10,000 random permutations of the asymmetry sequence, compute for each, and estimate the global -value as the fraction of permutations yielding a value at least as large as the observed statistic.
2.5 Optimal Split via Variance Minimization
As a complementary approach, we search for the split point that minimizes the pooled within-group variance of the asymmetry ratio. For each candidate split between cycles and (), we compute the weighted sum of within-group variances for the two resulting segments. The optimal split is the position that minimizes this quantity, and its significance is assessed by the same permutation framework: we compute the minimum pooled variance for each of 10,000 random permutations and estimate as the fraction achieving a pooled variance at least as small as the observed optimum.
2.6 Two-Sample Tests
To characterize the magnitude and significance of the difference between the two segments identified by the CUSUM analysis, we apply:
Permutation test for the difference in means: Under repeated random partitions of the data into groups of the same sizes as the observed segments, we estimate the probability of obtaining a mean difference at least as large as observed.
Wilcoxon rank-sum test: A nonparametric test for location shift that does not assume normality, appropriate for small-sample comparisons.
2.7 Binary Segmentation for Amplitude
To contextualize the asymmetry change point within the broader history of solar activity levels, we apply binary segmentation (Scott & Knott 1974) to the amplitude sequence. At each step, the algorithm finds the split that maximally reduces the total sum of squared deviations from segment means, iterating until no further significant reduction is achieved. This identifies distinct amplitude regimes across the 24 cycles.
2.8 Sensitivity Framework
Given the short length of the asymmetry sequence () and documented uncertainties in the early sunspot record, we implement a comprehensive sensitivity framework to assess the robustness of our findings under alternative assumptions about cycle numbering, outlier influence, timing precision, and variance testing methodology.
Lost cycle scenario. Usoskin et al. (2009) argued for the existence of a short, weak solar cycle between the traditional Cycles 4 and 5 during the 1790s. To evaluate the impact of this hypothesis on our results, we construct an alternative asymmetry sequence incorporating a putative lost cycle and re-run the change-point and variance analyses. Because the lost cycle would fall within the early segment, its inclusion alters the early-segment statistics while leaving the modern segment unchanged.
Outlier sensitivity. The early segment contains Cycle 7, which has the highest asymmetry ratio in the entire record (). To assess whether our results are driven by this single observation, we repeat the analysis with Cycle 7 removed and evaluate the persistence of the variance collapse.
Timing uncertainty propagation. Epoch determinations for early solar cycles carry larger uncertainties than those for modern cycles due to sparser observational coverage (Clette et al. 2014). We propagate differential timing uncertainties through a Monte Carlo simulation framework. For each of 5,000 iterations, we perturb the rise and fall times of each cycle by Gaussian noise calibrated to the expected epoch uncertainty for that cycle's era—larger perturbations for early cycles, smaller for modern cycles—and recompute the optimal-split analysis on each perturbed sequence. This yields distributions of -values and variance ratios that account for realistic measurement error.
Brown-Forsythe test for variance equality. The standard -test for equality of variances assumes normality, which cannot be validated in a sample of seven observations. We therefore employ the Brown-Forsythe test (Brown & Forsythe 1974), which replaces each observation with its absolute deviation from the group median and applies a standard -test to the transformed data. This procedure is robust to non-normality and is recommended for small samples. We assess its significance via permutation: for each of 10,000 random partitions of the data into groups of size 7 and 17, we compute the Brown-Forsythe statistic and estimate as the fraction equaling or exceeding the observed value.
3. Results
3.1 Cycle Parameters
Table 1 presents the fundamental parameters for all 24 complete solar cycles. Rise times range from 2.9 years (Cycle 3) to 6.8 years (Cycle 5), fall times from 4.0 years (Cycle 7) to 10.2 years (Cycle 4), and amplitudes from 81.2 (Cycle 6) to 285.0 (Cycle 19). The asymmetry ratio spans a wide range from 0.336 (Cycle 4) to 1.626 (Cycle 7). Across all 24 cycles, the mean asymmetry ratio is 0.701 with a bootstrap 95% confidence interval of [0.587, 0.834], and a standard deviation of 0.316. The mean rise time is 4.38 years and the mean fall time is 6.66 years.
Table 1. Solar cycle parameters for Cycles 1–24.
| Cycle | Min Year | Max Year | Amplitude | Rise (yr) | Fall (yr) | Length (yr) | Asymmetry |
|---|---|---|---|---|---|---|---|
| 1 | 1755.1 | 1761.5 | 144.1 | 6.3 | 5.0 | 11.3 | 1.266 |
| 2 | 1766.5 | 1769.7 | 193.0 | 3.3 | 5.7 | 9.0 | 0.566 |
| 3 | 1775.5 | 1778.4 | 264.2 | 2.9 | 6.3 | 9.3 | 0.460 |
| 4 | 1784.7 | 1788.1 | 235.3 | 3.4 | 10.2 | 13.6 | 0.336 |
| 5 | 1798.3 | 1805.1 | 82.0 | 6.8 | 5.2 | 12.0 | 1.323 |
| 6 | 1810.3 | 1816.4 | 81.2 | 6.1 | 7.0 | 13.1 | 0.870 |
| 7 | 1823.4 | 1829.9 | 119.2 | 6.5 | 4.0 | 10.5 | 1.626 |
| 8 | 1833.9 | 1837.2 | 244.9 | 3.3 | 6.3 | 9.7 | 0.526 |
| 9 | 1843.5 | 1848.1 | 219.9 | 4.6 | 7.8 | 12.4 | 0.585 |
| 10 | 1856.0 | 1860.1 | 186.2 | 4.2 | 7.1 | 11.2 | 0.588 |
| 11 | 1867.2 | 1870.6 | 234.0 | 3.4 | 8.3 | 11.8 | 0.410 |
| 12 | 1879.0 | 1884.0 | 124.4 | 5.0 | 6.2 | 11.2 | 0.801 |
| 13 | 1890.2 | 1894.0 | 146.5 | 3.8 | 8.0 | 11.8 | 0.480 |
| 14 | 1902.0 | 1906.1 | 107.1 | 4.1 | 7.4 | 11.5 | 0.550 |
| 15 | 1913.5 | 1917.6 | 175.7 | 4.1 | 6.0 | 10.1 | 0.681 |
| 16 | 1923.6 | 1928.3 | 130.2 | 4.7 | 5.4 | 10.1 | 0.862 |
| 17 | 1933.7 | 1937.3 | 198.6 | 3.6 | 6.8 | 10.4 | 0.524 |
| 18 | 1944.1 | 1947.4 | 218.7 | 3.2 | 6.9 | 10.2 | 0.469 |
| 19 | 1954.3 | 1958.2 | 285.0 | 3.9 | 6.6 | 10.5 | 0.595 |
| 20 | 1964.8 | 1968.9 | 156.6 | 4.1 | 7.3 | 11.4 | 0.557 |
| 21 | 1976.2 | 1980.0 | 232.9 | 3.8 | 6.7 | 10.5 | 0.556 |
| 22 | 1986.7 | 1989.9 | 212.5 | 3.2 | 6.5 | 9.7 | 0.487 |
| 23 | 1996.4 | 2001.9 | 180.3 | 5.5 | 7.1 | 12.6 | 0.777 |
| 24 | 2009.0 | 2014.3 | 116.4 | 5.3 | 5.7 | 11.0 | 0.940 |
3.2 Waldmeier Effect Confirmation
Table 2 presents Spearman rank correlations among cycle parameters. The classical Waldmeier effect is strongly confirmed: amplitude correlates with rise time at ρ = −0.749 (p < 0.0001), indicating that stronger cycles rise to their peak substantially faster. We also find a strong anticorrelation between amplitude and asymmetry ratio (ρ = −0.689, p < 0.0001), meaning that stronger cycles exhibit more pronounced fast-rise/slow-fall morphology. The correlation between amplitude and fall time is positive but modest (ρ = +0.331, p = 0.100), consistent with stronger cycles having somewhat longer decay phases. Neither the amplitude–cycle length nor the rise–fall time correlations reach conventional significance.
Table 2. Spearman rank correlations among solar cycle parameters ().
| Variable Pair | ρ | p-value |
|---|---|---|
| Amplitude vs. Rise time | −0.749 | < 0.0001 |
| Amplitude vs. Fall time | +0.331 | 0.100 |
| Amplitude vs. Asymmetry ratio | −0.689 | < 0.0001 |
| Amplitude vs. Cycle length | −0.321 | 0.112 |
| Rise time vs. Fall time | −0.348 | 0.082 |
3.3 CUSUM Analysis of Asymmetry
Figure 1 (described here in tabular form) presents the CUSUM trajectory for the asymmetry ratio sequence. The overall mean asymmetry is 0.701 (SD = 0.316). The CUSUM rises steeply through the early cycles, reflecting their above-average asymmetry values, reaching a maximum of +1.537 at Cycle 7. After Cycle 7, the CUSUM declines monotonically, as modern cycles contribute consistently below-average asymmetry values.
Table 3. CUSUM values for the asymmetry ratio sequence.
| Cycle | Asymmetry () | Deviation () | CUSUM () |
|---|---|---|---|
| 1 | 1.266 | +0.565 | +0.565 |
| 2 | 0.566 | −0.135 | +0.430 |
| 3 | 0.460 | −0.241 | +0.189 |
| 4 | 0.336 | −0.365 | −0.176 |
| 5 | 1.323 | +0.622 | +0.446 |
| 6 | 0.870 | +0.169 | +0.615 |
| 7 | 1.626 | +0.925 | +1.537 |
| 8 | 0.526 | −0.175 | +1.362 |
| 9 | 0.585 | −0.116 | +1.246 |
| 10 | 0.588 | −0.113 | +1.133 |
| 11 | 0.410 | −0.291 | +0.842 |
| 12 | 0.801 | +0.100 | +0.942 |
| 13 | 0.480 | −0.221 | +0.721 |
| 14 | 0.550 | −0.151 | +0.570 |
| 15 | 0.681 | −0.020 | +0.550 |
| 16 | 0.862 | +0.161 | +0.711 |
| 17 | 0.524 | −0.177 | +0.534 |
| 18 | 0.469 | −0.232 | +0.302 |
| 19 | 0.595 | −0.106 | +0.196 |
| 20 | 0.557 | −0.144 | +0.052 |
| 21 | 0.556 | −0.145 | −0.093 |
| 22 | 0.487 | −0.214 | −0.307 |
| 23 | 0.777 | +0.076 | −0.231 |
| 24 | 0.940 | +0.239 | +0.008 |
The CUSUM trajectory exhibits the classic signature of a change point: a sustained upward drift during the early segment followed by a sustained downward drift. The maximum absolute CUSUM value of 1.537 occurs at Cycle 7.
3.4 Change-Point Significance
The global CUSUM permutation test yields p = 0.162. This means that in 16.2% of random permutations, the maximum absolute CUSUM equals or exceeds 1.537. By conventional criteria (α = 0.05), this is not statistically significant. With only observations, the global CUSUM test has limited power to detect change points, particularly when the change occurs asymmetrically (7 vs. 17 observations rather than near the midpoint). The power deficit is a known property of CUSUM tests applied to short, unbalanced partitions, where the test statistic is structurally constrained by the smaller segment size.
The optimal-split analysis, which searches for the partition that minimizes pooled within-group variance, confirms the Cycle 7/8 boundary as the optimal split point and yields a substantially smaller p-value. Of 10,000 random permutations, only 200 (2.0%) produce a pooled variance as small as or smaller than the observed optimum, giving p = 0.020. This is significant at the α = 0.05 level and indicates that the two-segment partition at Cycle 8 captures genuine structure in the data.
The discrepancy between the two p-values reflects the different questions each test addresses. The global CUSUM tests whether any single location shows a significant cumulative departure; the optimal-split test asks whether the data are better described by two homogeneous segments than one. The latter is more powerful when the change point location is itself unknown and the analyst is willing to search for it. Importantly, the optimal-split test explicitly penalizes for the search over all possible split locations by evaluating each permutation under the same search procedure, thereby controlling for selection bias. The global CUSUM, by contrast, uses a fixed test statistic that does not account for which structural feature (mean shift, variance change, or both) most distinguishes the segments. Because our data exhibit a pronounced variance change, the variance-sensitive optimal-split test naturally achieves greater power.
3.5 Two-Segment Characterization
Table 4 summarizes the properties of the two segments defined by the optimal split.
Table 4. Two-segment analysis of asymmetry ratio, split at Cycle 8.
| Segment | Cycles | n | Mean Asymmetry | SD |
|---|---|---|---|---|
| Early | 1–7 | 7 | 0.921 | 0.494 |
| Modern | 8–24 | 17 | 0.611 | 0.150 |
| Shift | −0.310 |
The permutation test for the difference in means yields p = 0.020. The Wilcoxon rank-sum test gives z = 1.111, p = 0.266. The non-significance of the Wilcoxon test is not unexpected: with group sizes of 7 and 17, the rank-sum test has low power, and its null distribution is discrete with limited resolution. We regard the permutation test as the more appropriate and more powerful assessment for this sample configuration.
3.6 Variance Collapse
The most striking feature of the two-segment partition is not the shift in mean asymmetry—which, while substantial (0.310 units), involves overlapping ranges—but the dramatic collapse in variance. The standard deviation drops from 0.494 in the early segment to 0.150 in the modern segment, a reduction by a factor of 3.3. In terms of variance, this represents a roughly tenfold decrease (0.244 vs. 0.023).
To put this in concrete terms: early cycles span asymmetry ratios from 0.336 (Cycle 4, strongly asymmetric) to 1.626 (Cycle 7, strongly reverse-asymmetric), a range of 1.29 units. Modern cycles span from 0.410 (Cycle 11) to 0.940 (Cycle 24), a range of only 0.53 units—and 15 of the 17 modern cycles fall within the narrow band of 0.469 to 0.862.
To formally test the significance of this variance inequality, we apply the Brown-Forsythe test (Section 2.8), which yields F = 18.586 with a permutation p = 0.0008. This establishes that the variance difference between the two segments is highly significant and not attributable to chance partitioning. The Brown-Forsythe result is particularly informative because it directly tests the variance hypothesis without requiring normality, which cannot be validated in a sample of seven observations. The extreme significance level (p < 0.001) provides strong evidence that the two segments are drawn from populations with fundamentally different dispersions.
This variance collapse implies that the physical or observational processes governing cycle shape became substantially more constrained after the early nineteenth century. In the early regime, cycles exhibited a diverse range of temporal profiles—from strongly front-loaded to strongly back-loaded. In the modern regime, the fast-rise/slow-fall pattern is not merely dominant but remarkably consistent, with deviations from the mean being small and tightly bounded.
3.7 Component Analysis: Rise and Fall Separately
To determine whether the asymmetry change point reflects a shift in rise times, fall times, or both, we apply the CUSUM analysis to each component individually.
For rise time, the CUSUM reaches its maximum at Cycle 7 with a permutation p = 0.347. Early cycles (1–7) have a mean rise time of 5.04 years compared to 4.11 years for modern cycles (8–24), a difference of 0.93 years. The signal is in the expected direction but does not reach significance.
For fall time, the CUSUM peaks at Cycle 8 with p = 0.784. There is no meaningful difference in fall times between the two segments.
The asymmetry ratio thus captures a joint shift in cycle morphology that is not individually significant in either the rise or fall component alone. This is consistent with the interpretation that the asymmetry ratio, as a dimensionless shape parameter, is sensitive to coordinated changes in the rise-fall structure that are diluted when the components are examined in isolation.
3.8 Amplitude Change Points
Binary segmentation of the amplitude sequence identifies two change points. The primary change point falls before Cycle 23, with a variance reduction of 0.396. This corresponds to the well-documented decline in solar activity beginning in Cycle 23 and continuing into Cycle 24, sometimes interpreted as a possible onset of a modern grand minimum (e.g., de Jager & Duhau 2012). The secondary change point falls before Cycle 5, with a variance reduction of 0.170, corresponding to the onset of the Dalton Minimum—a period of anomalously weak solar activity spanning approximately Cycles 5–7 (~1790–1830).
Notably, the amplitude change points do not coincide with the asymmetry change point. The asymmetry transition occurs at Cycle 7/8 (~1833), at the end of the Dalton Minimum, whereas the amplitude change point at Cycle 5 marks its beginning. This temporal offset suggests that the asymmetry transition is not merely a secondary consequence of the amplitude change but may reflect a distinct aspect of solar dynamo evolution.
3.9 Sensitivity Analyses
Given the small sample size (), documented uncertainties in early cycle parameters, and the influence of individual data points, we conduct three complementary sensitivity analyses to assess the robustness of the variance collapse and the structural transition identified above.
3.9.1 Lost Cycle Scenario
Usoskin et al. (2009) presented evidence for a short, weak solar cycle between the traditional Cycles 4 and 5, potentially "lost" due to the sparse observational coverage of the 1790s. If confirmed, this lost cycle would alter the composition and statistics of the early segment. We evaluate three scenarios:
Table 5. Sensitivity of early-segment statistics to cycle numbering and outlier removal.
| Scenario | Early Cycles | n (early) | Mean | SD | Variance Ratio (early/modern) | Permutation p |
|---|---|---|---|---|---|---|
| A: Original | 1–7 | 7 | 0.921 | 0.493 | 10.9× | 0.020 |
| B: Lost cycle included | 1–7 + lost | 8 | 1.006 | 0.390 | 6.8× | 0.0007 |
| C: Cycle 7 removed | 1–6 | 6 | 0.803 | 0.420 | 7.9× | 0.0996 |
Under Scenario B, incorporating the lost cycle between Cycles 4 and 5 increases the early-segment sample size from 7 to 8 and reduces the early-segment standard deviation (from 0.493 to 0.390), yet the variance ratio remains large (6.8×) and the permutation test becomes substantially more significant (p = 0.0007). The lost cycle hypothesis thus strengthens rather than weakens the case for a structural transition. This occurs because the additional cycle adds statistical power while maintaining the fundamental disparity between early and modern variance.
Under Scenario C, the most conservative test removes Cycle 7—the most extreme observation in the entire record ()—from the early segment entirely. Even with this aggressive excision, the early-segment standard deviation remains 0.420 (compared to 0.150 for the modern segment), yielding a variance ratio of 7.9×. The permutation -value increases to 0.0996, which does not reach the conventional 0.05 threshold but remains below 0.10. Importantly, even without Cycle 7, the early-segment variance exceeds the modern-segment variance by nearly an order of magnitude, indicating that the variance collapse is not an artifact of a single outlier.
3.9.2 Timing Uncertainty Monte Carlo
Epoch determinations (dates of minima and maxima) for early solar cycles carry larger uncertainties than those for modern cycles due to sparser and less systematic observational coverage (Clette et al. 2014; Svalgaard & Schatten 2016). If these timing errors are large enough, they could inflate the computed variance of the asymmetry ratio in the early segment, potentially producing an artifactual variance collapse.
To quantify this effect, we implement a Monte Carlo simulation with 5,000 iterations. In each iteration, we perturb the rise and fall times of each cycle by adding Gaussian noise with standard deviation calibrated to the expected epoch uncertainty: ±1.0 year for early cycles (1–7) and ±0.3 year for modern cycles (8–24). This differential uncertainty reflects the consensus in the literature that early cycle extrema are determined with substantially less precision (Clette et al. 2014). The perturbed asymmetry ratios are recomputed, and the optimal-split analysis is repeated for each iteration.
Table 6. Monte Carlo timing uncertainty results (5,000 iterations).
| Scenario | Split significant (p < 0.05) | Variance ratio > 3× |
|---|---|---|
| Differential uncertainty (early ±1.0 yr, modern ±0.3 yr) | 69.9% | 99.5% |
| Equal uncertainty (all ±1.0 yr) | 45.9% | 82.7% |
Under the differential uncertainty model—which represents the realistic case—the optimal split remains significant (p < 0.05) in 69.9% of iterations, and the variance ratio exceeds 3× in 99.5% of iterations. The 30.1% of iterations where significance is lost represent cases where noise perturbations happen to reduce the early-segment variance or increase the modern-segment variance sufficiently to erode the signal, as expected with realistic measurement error in a short series.
The more informative finding is that the variance ratio exceeds 3× in nearly all iterations (99.5%). This means that even when timing errors of ±1.0 year are injected into early cycle parameters, the fundamental disparity in variance between the two regimes is preserved. In other words, timing uncertainty alone cannot account for the observed tenfold variance collapse; the noise level required to erase the signal would need to be far larger than any plausible epoch uncertainty.
As a stress test, we also apply equal uncertainty of ±1.0 year to all cycles, including modern ones—a scenario that substantially overestimates the timing uncertainty for well-observed modern cycles. Even under this deliberately conservative assumption, the variance ratio exceeds 3× in 82.7% of iterations, and the split remains significant in 45.9%. This establishes a lower bound on the robustness of the result under extreme assumptions about data quality.
3.9.3 Brown-Forsythe Test for Variance Equality
The informal comparison of standard deviations (0.494 vs. 0.150) presented in Section 3.6 is suggestive but lacks a formal significance assessment. The standard -test for equality of variances assumes normality, which cannot be validated with observations in the early segment. We therefore apply the Brown-Forsythe test (Brown & Forsythe 1974), a robust alternative that replaces each observation with its absolute deviation from the group median before testing for a location shift.
The Brown-Forsythe test yields F = 18.586. Under 10,000 permutations of the data into groups of size 7 and 17, only 8 permutations produce an value as large as or larger than the observed statistic, giving a permutation p = 0.0008. This is significant at the α = 0.001 level and constitutes the strongest individual statistical result of the study.
The Brown-Forsythe result establishes that the variance inequality between the two segments is not a consequence of the particular test used to find the change point, nor is it an artifact of non-normality in the underlying distributions. It is a robust, highly significant feature of the data that would be detected regardless of how the two segments were identified.
4. Discussion
4.1 Physical Interpretation
The detected change point at the Cycle 7/8 boundary (~1833) coincides with the end of the Dalton Minimum, a period of markedly reduced solar activity spanning approximately 1790–1830 (Usoskin et al. 2015). The Dalton Minimum included the weakest cycles in the telescopic sunspot record (Cycles 5 and 6, with amplitudes of 82.0 and 81.2 respectively) and featured anomalously long cycle periods. The transition to Cycle 8—with an amplitude of 244.9 and a rise time of only 3.3 years—was abrupt and dramatic.
Three interpretive frameworks present themselves:
Dynamo regime change. The Dalton Minimum may have represented a qualitatively different state of the solar dynamo, perhaps involving weakened meridional circulation (Karak & Choudhuri 2011) or altered differential rotation profiles. The transition to Cycle 8 could mark the re-establishment of a more vigorous and regular dynamo mode. In flux-transport dynamo models, fluctuations in meridional circulation directly affect cycle duration, rise time, and the expression of the Waldmeier effect (Karak & Choudhuri 2011). A shift from a regime of high meridional-flow variability to one of more stable circulation could produce the observed transition from variable to constrained asymmetry ratios.
Observational improvement. The early sunspot record (pre-1830) is known to be less reliable than the modern record. The network of observers was smaller, instruments were less standardized, and systematic biases are documented (Clette et al. 2014; Svalgaard & Schatten 2016). The SILSO v2.0 recalibration addresses several known inhomogeneities, but residual uncertainties remain, particularly for Cycles 1–7. If the timing of observed minima and maxima were less precisely determined in the early record, the resulting rise and fall times would carry larger errors, potentially inflating the variance of the asymmetry ratio.
Combined effects. Both factors may contribute. The early period may have experienced genuinely more variable dynamo behavior, while simultaneously being observed with less precision—with each factor amplifying the other in the derived asymmetry statistics.
4.2 Disentangling Physical and Observational Origins
We take the observational hypothesis seriously and address it quantitatively through the sensitivity framework presented in Section 3.9.
The Monte Carlo timing-uncertainty analysis (Section 3.9.2) directly evaluates how much of the observed variance collapse could be attributed to imprecise epoch determinations in the early record. Under the realistic differential-uncertainty model (early ±1.0 yr, modern ±0.3 yr), the variance ratio exceeds 3× in 99.5% of iterations. Even under the deliberately conservative equal-uncertainty model (all cycles ±1.0 yr), which overstates modern timing errors, the ratio exceeds 3× in 82.7% of iterations. These results establish that plausible timing uncertainties are insufficient to generate the observed variance disparity. The tenfold variance collapse requires either a genuine physical difference or systematic timing biases substantially larger than any documented in the literature.
Several additional lines of evidence favor at least a partial physical origin. First, the asymmetry ratio is a ratio of durations, not of absolute magnitudes. It depends on the timing of minima and maxima rather than on the calibration of sunspot counts. While timing estimates are certainly affected by data quality, the ratio construction provides a degree of robustness against the multiplicative calibration errors that dominate early sunspot number uncertainties. Even if early sunspot counts are systematically too low or too high, this should not strongly bias the timing of when the smoothed series reaches its peak.
Second, the 13-month smoothing kernel used to define cycle extrema operates on the monthly sunspot series, which—even for the early record—has sufficient temporal density to constrain peak timing to within approximately ±0.5–1.0 years (Hathaway 2015). The smoothing procedure suppresses short-term fluctuations but does not systematically broaden or distort the timing of the underlying cycle maximum. Importantly, smoothing on sparser early data would tend to flatten the peak, potentially reducing the measured asymmetry variability by attenuating sharp features, which works against the observed pattern of elevated early-segment variance. If anything, smoothing-induced bias should make the early segment appear less variable than it truly was.
Third, the SILSO v2.0 recalibration was specifically designed to correct known inhomogeneities in the historical record (Clette & Lefèvre 2016), yet the signal persists in the corrected data. While this does not guarantee that all systematic errors have been removed, it suggests that the most obvious sources of bias are not driving the result.
Fourth, the asymmetry change point at Cycle 7/8 is temporally offset from the amplitude change point at Cycle 5. If both signals were artifacts of a single observational discontinuity, we would expect them to be co-located.
Fifth, the lost-cycle analysis (Section 3.9.1) demonstrates that incorporating the hypothesized cycle between Cycles 4 and 5—which would alter early-segment composition—actually strengthens the result (p = 0.0007 vs. p = 0.020). This is because the lost cycle adds a data point to the early segment while preserving its high-variance character, increasing statistical power. Any explanation for the variance collapse must therefore account for its persistence (and indeed amplification) under alternative cycle numberings.
4.3 The Significance of Variance Collapse
The collapse of asymmetry variance from the early to modern segment is arguably the most important finding of this study. While the mean shift (from 0.921 to 0.611) indicates that cycles became more asymmetric on average, the variance collapse (SD from 0.494 to 0.150) tells a deeper story: cycle shapes became constrained.
The Brown-Forsythe test (F = 18.586, permutation p = 0.0008) establishes this variance inequality as highly significant—the strongest individual statistical result of the analysis, and one that does not depend on distributional assumptions. The sensitivity analyses further demonstrate that this result is not an artifact of outlier influence (the variance ratio remains 7.9× even after removing Cycle 7), cycle numbering uncertainty (the lost-cycle scenario yields a 6.8× ratio with p = 0.0007), or plausible timing errors (the Monte Carlo shows >3× variance ratio in 99.5% of iterations under realistic uncertainty).
In the early segment, the solar cycle could assume a wide variety of temporal profiles. Cycle 4 (asymmetry 0.336) was strongly front-loaded, while Cycle 7 (1.626) was strongly back-loaded; these two cycles, separated by only three intermediaries, displayed fundamentally different morphologies. In the modern segment, by contrast, cycle shape is narrowly bounded. Even the most extreme modern outlier (Cycle 24 at 0.940) falls well within the range spanned by the early segment.
This convergence has implications for dynamo theory. If the solar dynamo is governed by a process with a stable attractor in the modern regime, then deviations from the canonical fast-rise/slow-fall pattern should be small and mean-reverting—as observed. The early regime, by contrast, may reflect either a period when the dynamo was near a bifurcation point, a different dynamical basin, or simply the influence of less constrained boundary conditions (e.g., more variable meridional circulation). Flux-transport dynamo models that reproduce the Waldmeier effect through meridional circulation variability (Karak & Choudhuri 2011) predict that periods of irregular circulation should produce higher variance in cycle shape parameters, consistent with our observations.
The variance collapse also constrains the predictability of the solar cycle. In the modern regime, knowing that the asymmetry ratio will fall between approximately 0.4 and 0.9 substantially constrains the expected cycle morphology once the amplitude is estimated. In the early regime, no such constraint would have been available.
4.4 The Waldmeier Effect as an Emergent Property
The strong Waldmeier correlations we observe (amplitude vs. rise time: ρ = −0.749; amplitude vs. asymmetry: ρ = −0.689) are computed across all 24 cycles. However, the existence of a regime change raises the question of whether these correlations are universal properties of the solar dynamo or are instead driven by—or strengthened by—the modern regime.
If we consider that the early cycles (1–7) contribute both the most extreme amplitude values (the weak Cycles 5 and 6) and the most extreme asymmetry values (the reverse-asymmetric Cycles 5 and 7), then the full-sample correlations may conflate two effects: a genuine within-regime relationship between amplitude and rise rate, and a between-regime shift in both quantities. A full investigation of regime-dependent Waldmeier correlations would require larger samples than are currently available; we note only that the interpretation of long-baseline solar cycle statistics should account for the possibility of non-stationarity.
This perspective may help reconcile conflicting reports in the literature about the robustness of various Waldmeier relations in different sunspot indices (Hathaway et al. 2002), as the strength of these correlations may depend on the historical interval analyzed.
4.5 Limitations and Residual Uncertainties
Several limitations of this study warrant explicit discussion, alongside an assessment of how the sensitivity analyses constrain their impact.
Small sample size. With complete cycles, and the optimal split yielding groups of 7 and 17, statistical power is inherently limited. The global CUSUM test does not reach significance (p = 0.162), and the Wilcoxon test is non-significant (p = 0.266). The significant result from the optimal-split permutation test (p = 0.020) carries the caveat that the split location was chosen to minimize variance, introducing a selection effect—albeit one that the permutation framework accounts for by searching over all possible splits in each permutation. However, the Brown-Forsythe test (p = 0.0008) provides an independent, highly significant confirmation of the variance inequality that does not depend on the split-selection procedure.
Outlier sensitivity. The early segment contains Cycle 7, with the highest asymmetry ratio in the record (). However, removing Cycle 7 entirely (Section 3.9.1, Scenario C) reduces the variance ratio from 10.9× to 7.9×—still nearly an order of magnitude—and the permutation -value only increases from 0.020 to 0.0996. The variance collapse is therefore not contingent on any single observation.
Early record quality. Despite the SILSO v2.0 recalibration, the sunspot record before ~1850 remains less reliable than the modern record. Cycle boundary determinations for the earliest cycles are based on fewer observations and carry larger uncertainties. Our Monte Carlo analysis (Section 3.9.2) demonstrates that differential timing uncertainties of the expected magnitude (±1.0 yr early, ±0.3 yr modern) are insufficient to explain the observed variance collapse: the variance ratio exceeds 3× in 99.5% of iterations. Even under the extreme assumption of equal ±1.0 yr uncertainty for all cycles, the ratio exceeds 3× in 82.7% of iterations. These results constrain the observational explanation: timing errors alone cannot account for more than a modest fraction of the variance disparity.
Smoothing effects. The 13-month smoothing applied to the monthly sunspot series could in principle interact with the sparser early data to produce artifacts. However, the smoothing kernel is applied to monthly values that exist throughout the record (1749–2024), and its effect on peak-timing determination is modest. Moreover, smoothing on noisier data tends to attenuate sharp features and flatten extrema, which would reduce rather than inflate the variance of derived timing parameters. The observed pattern—higher variance in the early segment—is thus opposite to the expected smoothing artifact.
The lost cycle. The hypothesized lost cycle between Cycles 4 and 5 (Usoskin et al. 2009) remains debated. Rather than treating this as an unresolved limitation, we have directly incorporated it into the analysis (Section 3.9.1). Under the lost-cycle scenario, the structural transition becomes more significant (p = 0.0007), making this hypothesis an asset rather than a liability for the central finding.
Single dataset. Our analysis relies entirely on the SILSO sunspot number. Independent verification with alternative activity proxies—such as the Group Sunspot Number (Svalgaard & Schatten 2016), cosmogenic radionuclide records (Usoskin et al. 2015), or geomagnetic indices (where available)—would strengthen the case for a physical origin. However, cosmogenic proxies lack the temporal resolution needed to distinguish rise and fall durations within individual cycles.
Multiple testing. We have examined the asymmetry ratio, its individual components, and the amplitude sequence. While the primary result (asymmetry change point at Cycle 7/8) was the focus of the study, the presentation of multiple tests increases the family-wise error rate. We have been transparent about which results are significant and which are not.
5. Conclusion
We have applied CUSUM change-point detection and complementary statistical methods to the rise-fall asymmetry ratio of all 24 complete solar cycles (1755–2024) using the SILSO v2.0 sunspot number, supplemented by extensive sensitivity analyses to assess robustness. Our principal findings are:
The CUSUM analysis identifies the Cycle 7/8 boundary (~1833) as the most likely change point in the asymmetry sequence. The optimal-split permutation test yields p = 0.020, while the global CUSUM test gives p = 0.162.
Early cycles (1–7) have a mean asymmetry ratio of 0.921 (SD = 0.494), indicating near-symmetric and highly variable cycle profiles. Modern cycles (8–24) have a mean of 0.611 (SD = 0.150), indicating consistently asymmetric fast-rise/slow-fall morphology.
The variance collapse—a tenfold reduction from the early to the modern segment—is confirmed as highly significant by the Brown-Forsythe test (F = 18.586, permutation p = 0.0008), the strongest individual result of the study. This indicates that cycle shape became dramatically more constrained after the Dalton Minimum.
The variance collapse is robust to multiple perturbations: it persists when the hypothesized lost cycle is included (variance ratio 6.8×, p = 0.0007), when the most extreme outlier (Cycle 7) is removed (variance ratio 7.9×), and under Monte Carlo propagation of realistic timing uncertainties (variance ratio >3× in 99.5% of iterations). Even under the most conservative assumptions—equal timing uncertainty for all cycles and outlier removal—the fundamental disparity in variance between early and modern segments remains substantial.
The classical Waldmeier effect is confirmed (ρ = −0.749 between amplitude and rise time, p < 0.0001), along with a strong amplitude–asymmetry coupling (ρ = −0.689, p < 0.0001). These correlations may be partially emergent properties of the modern regime.
The asymmetry change point does not coincide with amplitude change points identified by binary segmentation (before Cycles 5 and 23), suggesting it reflects a distinct aspect of solar variability.
Neither rise time nor fall time individually shows a significant change point, indicating that the asymmetry ratio captures coordinated changes in cycle morphology not visible in the components alone.
These results provide evidence—but not conclusive proof—for a structural transition in solar cycle shape during the early nineteenth century. The sensitivity analyses substantially constrain the space of alternative explanations: the variance collapse cannot be explained by outlier influence, cycle numbering ambiguity, or plausible timing uncertainties in the early record. Whether the remaining signal is primarily physical (reflecting a dynamo regime change following the Dalton Minimum), primarily observational (reflecting systematic biases beyond those corrected by SILSO v2.0), or some combination of both cannot be definitively resolved with the available data. The extreme magnitude of the variance collapse, its robustness under multiple perturbation scenarios, the Brown-Forsythe confirmation at p = 0.0008, and the temporal offset from amplitude change points collectively favor a substantial physical contribution. Future work should seek independent confirmation through alternative activity proxies and should explore whether flux-transport dynamo models can reproduce the observed variance transition under physically motivated boundary conditions.
References
Brown, M. B. & Forsythe, A. B. 1974, "Robust Tests for the Equality of Variances," Journal of the American Statistical Association, 69, 364–367.
Cameron, R. & Schüssler, M. 2008, "A Robust Correlation between Growth Rate and Amplitude of Solar Cycles: Consequences for Prediction Methods," The Astrophysical Journal, 685, 1291–1296.
Clette, F., Svalgaard, L., Vaquero, J. M. & Cliver, E. W. 2014, "Revisiting the Sunspot Number: A 400-Year Perspective on the Solar Cycle," Space Science Reviews, 186, 35–103.
Clette, F. & Lefèvre, L. 2016, "The New Sunspot Number: Assembling All Corrections," Solar Physics, 291, 2629–2651.
de Jager, C. & Duhau, S. 2012, "Sudden Transitions and Grand Variations in the Solar Dynamo, Past and Future," Journal of Space Weather and Space Climate, 2, A07.
Hathaway, D. H. 2015, "The Solar Cycle," Living Reviews in Solar Physics, 12, 4.
Hathaway, D. H., Wilson, R. M. & Reichmann, E. J. 2002, "Group Sunspot Numbers: Sunspot Cycle Characteristics," Solar Physics, 211, 357–370.
Karak, B. B. & Choudhuri, A. R. 2011, "The Waldmeier Effect and the Flux Transport Solar Dynamo," Monthly Notices of the Royal Astronomical Society, 410, 1503–1512.
Manly, B. F. J. & Mackenzie, D. 2003, "CUSUM Environmental Monitoring in Time and Space," Environmental and Ecological Statistics, 10, 231–247.
Page, E. S. 1954, "Continuous Inspection Schemes," Biometrika, 41, 100–114.
Pipin, V. V. & Kosovichev, A. G. 2011, "The Asymmetry of Sunspot Cycles and Waldmeier Relations as a Result of a Nonlinear Surface-Shear Shaped Dynamo," The Astrophysical Journal, 741, 1.
Schwabe, H. 1844, "Sonnen-Beobachtungen im Jahre 1843," Astronomische Nachrichten, 21, 233–236.
Scott, A. J. & Knott, M. 1974, "A Cluster Analysis Method for Grouping Means in the Analysis of Variance," Biometrics, 30, 507–512.
Svalgaard, L. & Schatten, K. H. 2016, "Reconstruction of the Sunspot Group Number: The Backbone Method," Solar Physics, 291, 2653–2684.
Usoskin, I. G., Mursula, K. & Kovaltsov, G. A. 2009, "Lost Sunspot Cycle in the Beginning of Dalton Minimum: New Evidence and Consequences," Geophysical Research Letters, 36, L11101.
Usoskin, I. G., Arlt, R., Asvestari, E. et al. 2015, "The Maunder Minimum (1645–1715) Was Indeed a Grand Minimum: A Reassessment of Multiple Datasets," Astronomy & Astrophysics, 581, A95.
Waldmeier, M. 1935, "Neue Eigenschaften der Sonnenfleckenkurve," Astronomische Mitteilungen der Eidgenössischen Sternwarte Zürich, 14, 105–136.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# CUSUM Change-Point Detection in Solar Cycle Asymmetry
## Description
Reproduces the analysis from "CUSUM Change-Point Detection in Solar Cycle Asymmetry: Evidence for a Structural Transition in the Early Nineteenth Century." Downloads SILSO v2.0 monthly mean sunspot number data, characterizes 24 complete solar cycles (1–24), computes rise-fall asymmetry ratios, applies CUSUM change-point detection with permutation tests, performs optimal-split variance minimization, two-sample tests (permutation and Wilcoxon rank-sum), component analysis on rise and fall times separately, and binary segmentation on amplitude. All analysis uses pure Python standard library with random.seed(42).
## allowed-tools
Bash(python3 *), Bash(mkdir *), Bash(cat *), Bash(echo *)
## Steps
### Step 1: Create the analysis script
```bash
mkdir -p sunspot_cusum
```
```bash
cat > sunspot_cusum/analysis.py << 'PYTHON_SCRIPT'
"""
CUSUM Change-Point Detection in Solar Cycle Rise-Fall Asymmetry
================================================================
Uses SILSO monthly mean sunspot numbers (1749-2024).
Pure Python stdlib - no external dependencies.
random.seed(42)
"""
import csv, math, os, random, urllib.request
from collections import defaultdict
random.seed(42)
OUTDIR = "sunspot_cusum"
os.makedirs(OUTDIR, exist_ok=True)
DATA_URL = "https://www.sidc.be/SILSO/INFO/snmtotcsv.php"
DATA_FILE = os.path.join(OUTDIR, "monthly_sunspot.csv")
if not os.path.exists(DATA_FILE):
print("Downloading SILSO monthly sunspot data...")
urllib.request.urlretrieve(DATA_URL, DATA_FILE)
# ============================================================
# STEP 1: Parse data
# ============================================================
print("=" * 70)
print("STEP 1 - Parsing SILSO monthly sunspot data")
print("=" * 70)
monthly = [] # (year, month, decimal_year, ssn)
with open(DATA_FILE) as f:
for line in f:
line = line.strip()
if not line or line.startswith("#"):
continue
parts = line.split(";")
if len(parts) < 4:
continue
try:
year = int(parts[0].strip())
month = int(parts[1].strip())
dec_year = float(parts[2].strip())
ssn = float(parts[3].strip())
if year > 2024:
continue # Only use through 2024
if ssn < 0:
ssn = 0 # Mark missing as 0 (early records)
monthly.append((year, month, dec_year, ssn))
except ValueError:
continue
print(f" Loaded {len(monthly)} monthly records ({monthly[0][0]}-{monthly[-1][0]})")
# 13-month smoothed sunspot number
def smooth_13(data, idx):
"""13-month running mean centered at idx."""
if idx < 6 or idx >= len(data) - 6:
return None
vals = [data[i][3] for i in range(idx - 6, idx + 7)]
# Standard SILSO smoothing: 1/24 weight at ends, 1/12 for middle 11
return (vals[0] / 24 + sum(vals[1:12]) / 12 + vals[12] / 24)
smoothed = []
for i in range(len(monthly)):
s = smooth_13(monthly, i)
if s is not None:
smoothed.append((monthly[i][0], monthly[i][1], monthly[i][2], s))
print(f" Smoothed records: {len(smoothed)}")
# ============================================================
# STEP 2: Identify solar cycle minima and maxima
# ============================================================
print("\n" + "=" * 70)
print("STEP 2 - Identifying solar cycle boundaries")
print("=" * 70)
# Use well-established cycle boundaries from SILSO/NOAA
CYCLE_MINIMA = {
1: 1755.2, 2: 1766.4, 3: 1775.5, 4: 1784.7, 5: 1798.3,
6: 1810.6, 7: 1823.3, 8: 1833.9, 9: 1843.5, 10: 1856.0,
11: 1867.2, 12: 1878.9, 13: 1890.2, 14: 1902.1, 15: 1913.6,
16: 1923.6, 17: 1933.8, 18: 1944.2, 19: 1954.3, 20: 1964.8,
21: 1976.4, 22: 1986.7, 23: 1996.4, 24: 2008.9, 25: 2019.9,
}
def find_max_in_range(data, t_start, t_end):
"""Find the time and value of maximum smoothed SSN in a range."""
best_t = None
best_v = -1
for y, m, t, v in data:
if t_start <= t <= t_end and v > best_v:
best_v = v
best_t = t
return best_t, best_v
def find_min_in_range(data, t_start, t_end):
"""Find the time and value of minimum smoothed SSN in a range."""
best_t = None
best_v = float('inf')
for y, m, t, v in data:
if t_start <= t <= t_end and v < best_v:
best_v = v
best_t = t
return best_t, best_v
cycles = []
cycle_nums = sorted(CYCLE_MINIMA.keys())
for i, cn in enumerate(cycle_nums):
t_min_start = CYCLE_MINIMA[cn]
if i + 1 < len(cycle_nums):
t_next_min = CYCLE_MINIMA[cycle_nums[i + 1]]
else:
t_next_min = 2024.99 # End of data for cycle 25
# Find actual minimum near the nominal minimum
actual_min_t, actual_min_v = find_min_in_range(smoothed, t_min_start - 1, t_min_start + 1)
# Find maximum between this minimum and next
t_max, v_max = find_max_in_range(smoothed, t_min_start, t_next_min)
if t_max is None or actual_min_t is None:
continue
# Rise time = min to max
rise_time = t_max - actual_min_t
# Fall time = max to next min
if i + 1 < len(cycle_nums):
actual_next_min_t, _ = find_min_in_range(smoothed, t_next_min - 1, t_next_min + 1)
if actual_next_min_t:
fall_time = actual_next_min_t - t_max
else:
fall_time = t_next_min - t_max
else:
fall_time = None # Cycle 25 not complete
cycle_length = fall_time + rise_time if fall_time else None
asymmetry = rise_time / fall_time if fall_time and fall_time > 0 else None
cycles.append({
"num": cn, "min_t": actual_min_t, "max_t": t_max,
"min_v": actual_min_v, "max_v": v_max,
"rise": rise_time, "fall": fall_time,
"length": cycle_length, "asymmetry": asymmetry
})
print(f"\n {'Cycle':>5s} | {'Min':>7s} | {'Max':>7s} | {'Amplitude':>9s} | {'Rise':>5s} | {'Fall':>5s} | {'Length':>6s} | {'Asym':>6s}")
print(" " + "-" * 75)
for c in cycles:
fall_str = f"{c['fall']:.1f}" if c['fall'] else "-"
length_str = f"{c['length']:.1f}" if c['length'] else "-"
asym_str = f"{c['asymmetry']:.3f}" if c['asymmetry'] else "-"
print(f" {c['num']:5d} | {c['min_t']:7.1f} | {c['max_t']:7.1f} | {c['max_v']:9.1f} | {c['rise']:5.1f} | {fall_str:>5s} | {length_str:>6s} | {asym_str:>6s}")
# ============================================================
# STEP 3: Waldmeier effect verification
# ============================================================
print("\n" + "=" * 70)
print("STEP 3 - Waldmeier effect (amplitude vs rise time)")
print("=" * 70)
complete = [c for c in cycles if c['asymmetry'] is not None]
amps = [c["max_v"] for c in complete]
rises = [c["rise"] for c in complete]
falls = [c["fall"] for c in complete]
asyms = [c["asymmetry"] for c in complete]
lengths = [c["length"] for c in complete]
def spearman(x, y):
n = len(x)
def rank(v):
indexed = sorted(enumerate(v), key=lambda p: p[1])
ranks = [0.0]*n
i = 0
while i < n:
j = i
while j < n-1 and indexed[j+1][1] == indexed[j][1]: j += 1
ar = (i+j)/2+1
for k in range(i, j+1): ranks[indexed[k][0]] = ar
i = j+1
return ranks
rx, ry = rank(x), rank(y)
mx, my = sum(rx)/n, sum(ry)/n
num = sum((rx[i]-mx)*(ry[i]-my) for i in range(n))
dx = sum((rx[i]-mx)**2 for i in range(n))
dy = sum((ry[i]-my)**2 for i in range(n))
if dx==0 or dy==0: return 0, 1
rho = num/math.sqrt(dx*dy)
if abs(rho) >= 1: return rho, 0
t = rho*math.sqrt((n-2)/(1-rho**2))
p = 2*(1-0.5*(1+math.erf(abs(t)/math.sqrt(2))))
return rho, p
r_amp_rise, p_amp_rise = spearman(amps, rises)
r_amp_fall, p_amp_fall = spearman(amps, falls)
r_amp_asym, p_amp_asym = spearman(amps, asyms)
r_amp_len, p_amp_len = spearman(amps, lengths)
r_rise_fall, p_rise_fall = spearman(rises, falls)
print(f" Amplitude vs Rise time: rho = {r_amp_rise:.3f}, p = {p_amp_rise:.4f}")
print(f" Amplitude vs Fall time: rho = {r_amp_fall:.3f}, p = {p_amp_fall:.4f}")
print(f" Amplitude vs Asymmetry: rho = {r_amp_asym:.3f}, p = {p_amp_asym:.4f}")
print(f" Amplitude vs Length: rho = {r_amp_len:.3f}, p = {p_amp_len:.4f}")
print(f" Rise vs Fall time: rho = {r_rise_fall:.3f}, p = {p_rise_fall:.4f}")
# ============================================================
# STEP 4: CUSUM on asymmetry ratio
# ============================================================
print("\n" + "=" * 70)
print("STEP 4 - CUSUM change-point detection on asymmetry ratio")
print("=" * 70)
asym_series = [(c["num"], c["asymmetry"]) for c in complete]
mean_asym = sum(a for _, a in asym_series) / len(asym_series)
print(f" Overall mean asymmetry ratio: {mean_asym:.4f}")
# Standard CUSUM: S_k = sum_{i=1}^{k} (x_i - mu)
cusum = []
s = 0
for cn, a in asym_series:
s += (a - mean_asym)
cusum.append((cn, s))
print(f"\n CUSUM values:")
for cn, s in cusum:
print(f" Cycle {cn:2d}: CUSUM = {s:+.4f}")
max_abs_cusum = max(cusum, key=lambda x: abs(x[1]))
print(f"\n Maximum |CUSUM| at Cycle {max_abs_cusum[0]}: {max_abs_cusum[1]:+.4f}")
# ============================================================
# STEP 5: Permutation test for significance
# ============================================================
print("\n" + "=" * 70)
print("STEP 5 - Permutation significance test for CUSUM change point")
print("=" * 70)
obs_max_cusum = max(abs(s) for _, s in cusum)
n_asym = len(asym_series)
asym_vals = [a for _, a in asym_series]
N_PERM = 10000
count_exceed = 0
for _ in range(N_PERM):
perm = asym_vals[:]
random.shuffle(perm)
mu = sum(perm) / len(perm)
s = 0
max_s = 0
for v in perm:
s += (v - mu)
if abs(s) > max_s:
max_s = abs(s)
if max_s >= obs_max_cusum:
count_exceed += 1
p_value = count_exceed / N_PERM
print(f" Observed max |CUSUM| = {obs_max_cusum:.4f}")
print(f" Permutation p-value = {p_value:.4f} ({N_PERM:,} permutations)")
# ============================================================
# STEP 6: Optimal change point via variance minimization
# ============================================================
print("\n" + "=" * 70)
print("STEP 6 - Optimal change-point location (minimize total variance)")
print("=" * 70)
best_k = None
best_var = float('inf')
results = []
for k in range(2, n_asym - 1):
seg1 = asym_vals[:k]
seg2 = asym_vals[k:]
m1, m2 = sum(seg1)/len(seg1), sum(seg2)/len(seg2)
v1 = sum((x-m1)**2 for x in seg1) / len(seg1)
v2 = sum((x-m2)**2 for x in seg2) / len(seg2)
total_var = v1 * len(seg1) + v2 * len(seg2)
results.append((asym_series[k][0], k, m1, m2, m2 - m1, total_var))
if total_var < best_var:
best_var = total_var
best_k = k
results.sort(key=lambda x: x[5])
print(f"\n Top 5 change-point candidates (by variance reduction):")
print(f" {'Split at':>10s} | {'Mean before':>11s} | {'Mean after':>10s} | {'Shift':>8s} | {'Total var':>10s}")
print(" " + "-" * 60)
for cn, k, m1, m2, shift, tv in results[:5]:
print(f" Cycle {cn:>3d} | {m1:>11.4f} | {m2:>10.4f} | {shift:>+8.4f} | {tv:>10.4f}")
# ============================================================
# STEP 7: Two-sample tests at the optimal change point
# ============================================================
print("\n" + "=" * 70)
print("STEP 7 - Two-sample tests at optimal change point")
print("=" * 70)
best_cycle = asym_series[best_k][0]
seg1 = asym_vals[:best_k]
seg2 = asym_vals[best_k:]
m1 = sum(seg1) / len(seg1)
m2 = sum(seg2) / len(seg2)
sd1 = (sum((x-m1)**2 for x in seg1) / (len(seg1)-1)) ** 0.5
sd2 = (sum((x-m2)**2 for x in seg2) / (len(seg2)-1)) ** 0.5
print(f" Change point: before Cycle {best_cycle}")
print(f" Segment 1 (Cycles {asym_series[0][0]}-{asym_series[best_k-1][0]}): n={len(seg1)}, mean={m1:.4f}, SD={sd1:.4f}")
print(f" Segment 2 (Cycles {asym_series[best_k][0]}-{asym_series[-1][0]}): n={len(seg2)}, mean={m2:.4f}, SD={sd2:.4f}")
# Exact permutation test for difference in means
obs_diff = abs(m2 - m1)
combined = seg1 + seg2
n1, n2 = len(seg1), len(seg2)
if math.comb(n1+n2, n1) <= 100000:
print(f" Running exact permutation test ({math.comb(n1+n2, n1):,} permutations)...")
from itertools import combinations
count = 0
total = 0
for combo in combinations(range(n1+n2), n1):
s1 = sum(combined[i] for i in combo) / n1
s2 = (sum(combined) - sum(combined[i] for i in combo)) / n2
if abs(s2 - s1) >= obs_diff:
count += 1
total += 1
p_exact = count / total
print(f" Exact permutation p-value: {p_exact:.6f}")
else:
print(f" Running Monte Carlo permutation test (100,000 samples)...")
count = 0
for _ in range(100000):
perm = combined[:]
random.shuffle(perm)
pm1 = sum(perm[:n1]) / n1
pm2 = sum(perm[n1:]) / n2
if abs(pm2 - pm1) >= obs_diff:
count += 1
p_exact = count / 100000
print(f" Monte Carlo permutation p-value: {p_exact:.6f}")
# Wilcoxon rank-sum test
all_vals = [(v, 1) for v in seg1] + [(v, 2) for v in seg2]
all_vals.sort(key=lambda x: x[0])
ranks = []
i = 0
while i < len(all_vals):
j = i
while j < len(all_vals) - 1 and all_vals[j+1][0] == all_vals[j][0]:
j += 1
avg_rank = (i + j) / 2 + 1
for k in range(i, j+1):
ranks.append((avg_rank, all_vals[k][1]))
i = j + 1
W = sum(r for r, g in ranks if g == 1)
E_W = n1 * (n1 + n2 + 1) / 2
Var_W = n1 * n2 * (n1 + n2 + 1) / 12
z = (W - E_W) / math.sqrt(Var_W)
p_wilcox = 2 * (1 - 0.5 * (1 + math.erf(abs(z) / math.sqrt(2))))
print(f" Wilcoxon rank-sum: W={W:.1f}, z={z:.3f}, p={p_wilcox:.4f}")
# ============================================================
# STEP 8: CUSUM on rise time and fall time separately
# ============================================================
print("\n" + "=" * 70)
print("STEP 8 - CUSUM on rise time and fall time separately")
print("=" * 70)
for label, series in [("Rise time", rises), ("Fall time", falls)]:
mu = sum(series) / len(series)
cusum_s = []
s = 0
for i, v in enumerate(series):
s += (v - mu)
cusum_s.append(s)
max_idx = max(range(len(cusum_s)), key=lambda i: abs(cusum_s[i]))
max_val = cusum_s[max_idx]
cycle_at_max = complete[max_idx]["num"]
# Permutation test
obs_max = max(abs(x) for x in cusum_s)
exceed = 0
for _ in range(10000):
perm = series[:]
random.shuffle(perm)
mu_p = sum(perm) / len(perm)
s = 0; ms = 0
for v in perm:
s += (v - mu_p)
if abs(s) > ms: ms = abs(s)
if ms >= obs_max: exceed += 1
p = exceed / 10000
# Optimal split
best_bk = None; best_bv = float('inf')
for k in range(2, len(series) - 1):
s1 = series[:k]; s2 = series[k:]
m1 = sum(s1)/len(s1); m2 = sum(s2)/len(s2)
tv = sum((x-m1)**2 for x in s1) + sum((x-m2)**2 for x in s2)
if tv < best_bv: best_bv = tv; best_bk = k
bm1 = sum(series[:best_bk]) / best_bk
bm2 = sum(series[best_bk:]) / (len(series) - best_bk)
print(f"\n {label}:")
print(f" Overall mean: {mu:.2f} years")
print(f" Max |CUSUM| at Cycle {cycle_at_max}: {max_val:+.3f}")
print(f" Permutation p-value: {p:.4f}")
print(f" Optimal split at Cycle {complete[best_bk]['num']}: before={bm1:.2f} yr, after={bm2:.2f} yr")
# ============================================================
# STEP 9: CUSUM on amplitude (binary segmentation)
# ============================================================
print("\n" + "=" * 70)
print("STEP 9 - CUSUM on cycle amplitude")
print("=" * 70)
amp_series = [c["max_v"] for c in complete]
amp_mean = sum(amp_series) / len(amp_series)
cusum_amp = []
s = 0
for v in amp_series:
s += (v - amp_mean)
cusum_amp.append(s)
print(f" Amplitude CUSUM values:")
for i, c in enumerate(complete):
print(f" Cycle {c['num']:2d}: amp={c['max_v']:>6.1f}, CUSUM={cusum_amp[i]:+.1f}")
# Multiple change points via binary segmentation
def find_change_point(series, start, end):
"""Find single best change point in series[start:end]."""
if end - start < 4:
return None, float('inf')
sub = series[start:end]
best_k = None; best_var = float('inf')
for k in range(2, len(sub) - 1):
s1 = sub[:k]; s2 = sub[k:]
m1 = sum(s1)/len(s1); m2 = sum(s2)/len(s2)
tv = sum((x-m1)**2 for x in s1) + sum((x-m2)**2 for x in s2)
if tv < best_var:
best_var = tv; best_k = k
m_all = sum(sub)/len(sub)
no_split = sum((x-m_all)**2 for x in sub)
reduction = (no_split - best_var) / no_split if no_split > 0 else 0
return start + best_k, reduction
print(f"\n Binary segmentation change-point candidates (amplitude):")
cp1, red1 = find_change_point(amp_series, 0, len(amp_series))
if cp1:
print(f" CP1: Before Cycle {complete[cp1]['num']}, variance reduction = {red1:.3f}")
cp2a, red2a = find_change_point(amp_series, 0, cp1)
cp2b, red2b = find_change_point(amp_series, cp1, len(amp_series))
if cp2a and red2a > 0.15:
print(f" CP2a: Before Cycle {complete[cp2a]['num']}, reduction = {red2a:.3f}")
if cp2b and red2b > 0.15:
print(f" CP2b: Before Cycle {complete[cp2b]['num']}, reduction = {red2b:.3f}")
# ============================================================
# STEP 10: Summary statistics
# ============================================================
print("\n" + "=" * 70)
print("STEP 10 - Summary statistics for paper")
print("=" * 70)
print(f"\n Dataset: SILSO monthly mean sunspot number v2.0")
print(f" Period: 1749-2024 ({len(monthly)} monthly observations)")
print(f" Complete solar cycles analyzed: {len(complete)} (Cycles {complete[0]['num']}-{complete[-1]['num']})")
print(f" Mean asymmetry ratio: {mean_asym:.4f}")
print(f" SD asymmetry ratio: {(sum((a-mean_asym)**2 for a in asym_vals)/(len(asym_vals)-1))**0.5:.4f}")
print(f" Range: [{min(asym_vals):.3f}, {max(asym_vals):.3f}]")
print(f" Mean rise time: {sum(rises)/len(rises):.2f} yr")
print(f" Mean fall time: {sum(falls)/len(falls):.2f} yr")
print(f" Mean amplitude: {sum(amps)/len(amps):.1f}")
# Bootstrap CI for mean asymmetry
boot_means = []
for _ in range(10000):
bsamp = random.choices(asym_vals, k=len(asym_vals))
boot_means.append(sum(bsamp)/len(bsamp))
boot_means.sort()
print(f" Bootstrap 95% CI for mean asymmetry: [{boot_means[250]:.4f}, {boot_means[9749]:.4f}]")
print("\n" + "=" * 70)
print("ANALYSIS COMPLETE")
print("=" * 70)
PYTHON_SCRIPT
```
### Step 2: Create the sensitivity analysis script
```bash
cat > sunspot_cusum/sensitivity.py << 'SENSITIVITY_SCRIPT'
"""
Sensitivity analyses for CUSUM solar cycle asymmetry paper.
Pure Python stdlib. random.seed(42).
"""
import math, random
random.seed(42)
# Original asymmetry data
cycles_orig = [
(1, 1.266), (2, 0.566), (3, 0.460), (4, 0.336), (5, 1.323),
(6, 0.870), (7, 1.626), (8, 0.526), (9, 0.585), (10, 0.588),
(11, 0.410), (12, 0.801), (13, 0.480), (14, 0.550), (15, 0.681),
(16, 0.862), (17, 0.524), (18, 0.469), (19, 0.595), (20, 0.557),
(21, 0.556), (22, 0.487), (23, 0.777), (24, 0.940)
]
asym_orig = [a for _, a in cycles_orig]
rise_times = [6.3, 3.3, 2.9, 3.4, 6.8, 6.1, 6.5, 3.3, 4.6, 4.2,
3.4, 5.0, 3.8, 4.1, 4.1, 4.7, 3.6, 3.2, 3.9, 4.1,
3.8, 3.2, 5.5, 5.3]
fall_times = [5.0, 5.7, 6.3, 10.2, 5.2, 7.0, 4.0, 6.3, 7.8, 7.1,
8.3, 6.2, 8.0, 7.4, 6.0, 5.4, 6.8, 6.9, 6.6, 7.3,
6.7, 6.5, 7.1, 5.7]
def compute_split_pvalue(asym_vals, split_k, n_perm=10000):
seg1 = asym_vals[:split_k]
seg2 = asym_vals[split_k:]
obs_diff = abs(sum(seg2)/len(seg2) - sum(seg1)/len(seg1))
combined = seg1 + seg2
n1, n2 = len(seg1), len(seg2)
count = 0
for _ in range(n_perm):
perm = combined[:]
random.shuffle(perm)
if abs(sum(perm[n1:])/n2 - sum(perm[:n1])/n1) >= obs_diff:
count += 1
return count / n_perm
# 1. Lost Cycle Sensitivity
print("=" * 70)
print("SENSITIVITY 1: Lost Cycle Between Cycles 4 and 5")
print("=" * 70)
asym_lost = [1.266, 0.566, 0.460, 0.94, 1.00, 1.323, 0.870, 1.626,
0.526, 0.585, 0.588, 0.410, 0.801, 0.480, 0.550, 0.681,
0.862, 0.524, 0.469, 0.595, 0.557, 0.556, 0.487, 0.777, 0.940]
for label, data, split in [("Original (24 cycles)", asym_orig, 7),
("Lost cycle (25 cycles)", asym_lost, 8),
("Remove Cycle 7 (23 cycles)", [a for c,a in cycles_orig if c!=7], 6)]:
seg1 = data[:split]
seg2 = data[split:]
m1, m2 = sum(seg1)/len(seg1), sum(seg2)/len(seg2)
sd1 = (sum((x-m1)**2 for x in seg1)/max(1,len(seg1)-1))**0.5
sd2 = (sum((x-m2)**2 for x in seg2)/max(1,len(seg2)-1))**0.5
p = compute_split_pvalue(data, split)
print(f" {label}: mean shift={m2-m1:+.3f}, var ratio={sd1**2/sd2**2:.1f}x, p={p:.4f}")
# 2. Timing Uncertainty Monte Carlo
print("\n" + "=" * 70)
print("SENSITIVITY 2: Timing Uncertainty Monte Carlo")
print("=" * 70)
for label, early_sigma, modern_sigma in [("Differential (±1.0/±0.3)", 1.0, 0.3),
("Equal (±1.0/±1.0)", 1.0, 1.0)]:
n_mc = 5000
vr_above_3 = 0
for _ in range(n_mc):
pa = []
for i in range(24):
sigma = early_sigma if i < 7 else modern_sigma
r = rise_times[i] + random.gauss(0, sigma)
f = fall_times[i] + random.gauss(0, sigma)
pa.append(r/f if r > 0.5 and f > 0.5 else rise_times[i]/fall_times[i])
s1, s2 = pa[:7], pa[7:]
m1, m2 = sum(s1)/7, sum(s2)/17
sd1 = (sum((x-m1)**2 for x in s1)/6)**0.5
sd2 = (sum((x-m2)**2 for x in s2)/16)**0.5
if sd2 > 0 and sd1**2/sd2**2 > 3: vr_above_3 += 1
print(f" {label}: variance ratio >3x in {vr_above_3/n_mc:.1%} of {n_mc} iterations")
# 3. Brown-Forsythe Test
print("\n" + "=" * 70)
print("SENSITIVITY 3: Brown-Forsythe Variance Test")
print("=" * 70)
seg1 = sorted(asym_orig[:7])
seg2 = sorted(asym_orig[7:])
med1, med2 = seg1[3], seg2[8]
d1 = [abs(x - med1) for x in asym_orig[:7]]
d2 = [abs(x - med2) for x in asym_orig[7:]]
m_d1, m_d2 = sum(d1)/7, sum(d2)/17
N = 24
gm = (sum(d1)+sum(d2))/N
ss_b = 7*(m_d1-gm)**2 + 17*(m_d2-gm)**2
ss_w = sum((x-m_d1)**2 for x in d1) + sum((x-m_d2)**2 for x in d2)
F = (ss_b/1)/(ss_w/22)
all_d = d1 + d2
count = 0
for _ in range(10000):
perm = all_d[:]
random.shuffle(perm)
pg1, pg2 = perm[:7], perm[7:]
pm1, pm2 = sum(pg1)/7, sum(pg2)/17
pgm = (sum(pg1)+sum(pg2))/N
pF = (7*(pm1-pgm)**2+17*(pm2-pgm)**2) / (sum((x-pm1)**2 for x in pg1)+sum((x-pm2)**2 for x in pg2)) * 22
if pF >= F: count += 1
print(f" Brown-Forsythe F = {F:.3f}, permutation p = {count/10000:.4f}")
print("\nSENSITIVITY ANALYSES COMPLETE")
SENSITIVITY_SCRIPT
```
### Step 3: Run the main analysis
```bash
python3 sunspot_cusum/analysis.py
```
### Step 4: Run the sensitivity analyses
```bash
python3 sunspot_cusum/sensitivity.py
```
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.