This paper investigates the econometric foundations underlying instrumental variable estimation under monotonicity violations: sharp identified sets are 40% wider than point estimates suggest. Using a combination of Monte Carlo simulations, analytical derivations, and empirical applications, we demonstrate that conventional approaches suffer from previously unrecognized biases.
Noncompliance in cluster-randomized trials (CRTs) is pervasive---typically 15--40% deviate from assignment---yet ITT analyses ignore this and per-protocol are biased. We develop a hierarchical Bayesian principal stratification framework for CRTs estimating complier average causal effects (CACEs).
Hamiltonian Monte Carlo (HMC) with dual averaging step size adaptation is the gold standard for sampling continuous distributions, but sharp non-asymptotic mixing time bounds have been elusive. We prove that for strongly log-concave targets with condition number $\kappa$ in $d$ dimensions, HMC with dual averaging achieves $\epsilon$-mixing in total variation using $O(d^{1/4} \kappa^{1/4} \log(1/\epsilon))$ gradient evaluations.
This paper develops new statistical methodology for calibration of weather ensemble forecasts via distributional regression reduces crps by 31%: a 10-year verification study. We propose a Bayesian hierarchical framework that jointly models multiple sources of uncertainty while accounting for complex dependence structures including spatial, temporal, and measurement error components.
Group sequential designs with pre-specified interim analyses are standard for ethical trial monitoring, but modern infrastructure enables continuous monitoring, raising Type I error concerns. We prove that information-adaptive group sequential designs maintain familywise Type I error at 0.
This paper develops new statistical methodology for bayesian spatial survival models identify a 3.2-year life expectancy gap attributable to county-level air quality: a medicare cohort study.
Adaptive enrichment designs allow clinical trials to restrict enrollment to a promising subpopulation at interim analysis. We conduct a 200-configuration Phase III oncology simulation study varying subgroup prevalence (10--60%), treatment effect heterogeneity, and endpoint type.
We investigate a fundamental computational challenge in modern Bayesian statistics: unbiased mcmc via couplings removes all burn-in bias: practical guidelines requiring only 2x the computational cost. Through rigorous theoretical analysis and extensive numerical experiments, we characterize the conditions under which existing algorithms fail and propose a novel correction that restores reliable performance.
Causal mediation analysis seeks to decompose total treatment effects into direct and indirect pathways. In longitudinal settings with time-varying confounders affected by prior treatment, standard mediation methods yield biased estimates.
This paper develops new statistical methodology for record linkage without unique identifiers achieves 98.5% precision using bayesian fellegi-sunter with informative priors: a census application.
Non-centered parameterizations (NCPs) are widely recommended for hierarchical Bayesian models when group-level variance is small, yet the choice between centered and non-centered forms is typically manual. We present AutoReparam, an automatic reparameterization selection algorithm using a pilot MCMC run of 500 iterations.
This paper develops new statistical methodology for two-phase sampling designs for electronic health records reduce bias by 67% compared to convenience samples: validation in 4 cohorts. We propose a Bayesian hierarchical framework that jointly models multiple sources of uncertainty while accounting for complex dependence structures including spatial, temporal, and measurement error components.
Score function estimators (SFEs) are the dominant approach for gradient estimation in models with discrete latent variables, yet their high variance remains a critical bottleneck. We present a systematic evaluation of Rao-Blackwellization strategies applied to SFEs across 12 discrete latent variable architectures and 8 benchmark datasets.
This paper develops new statistical methodology for joint modeling of longitudinal biomarkers and time-to-event data improves dynamic predictions by 18% in auc: a comparison across 12 diseases. We propose a Bayesian hierarchical framework that jointly models multiple sources of uncertainty while accounting for complex dependence structures including spatial, temporal, and measurement error components.
This paper develops new statistical methodology for species distribution models with preferential sampling correction increase predicted range sizes by 23%: a global assessment for 500 bird species. We propose a Bayesian hierarchical framework that jointly models multiple sources of uncertainty while accounting for complex dependence structures including spatial, temporal, and measurement error components.
This paper develops new statistical methodology for exposure-response modeling via targeted minimum loss estimation reveals non-monotone dose-toxicity curves in 3 oncology drugs. We propose a Bayesian hierarchical framework that jointly models multiple sources of uncertainty while accounting for complex dependence structures including spatial, temporal, and measurement error components.
This paper develops new statistical methodology for functional data analysis of continuous glucose monitor traces predicts hba1c with r² = 0.89: outperforming traditional summary statistics.
We investigate a fundamental computational challenge in modern Bayesian statistics: stein variational gradient descent collapses in high dimensions: mode coverage drops below 50% for d > 20. Through rigorous theoretical analysis and extensive numerical experiments, we characterize the conditions under which existing algorithms fail and propose a novel correction that restores reliable performance.
We provide causal evidence that public pension generosity reduces private savings by only 30 cents per dollar: revised estimates using administrative data from 8 oecd countries. Our identification strategy combines quasi-experimental variation with state-of-the-art econometric techniques including difference-in-differences with staggered treatment adoption, instrumental variables estimation, and regression discontinuity designs.