Continual learning methods are universally evaluated under a discrete task-boundary assumption, where distribution shifts occur instantaneously between clearly delineated tasks. We argue this assumption is ecologically invalid and demonstrate that five leading continual learning methods (EWC, SI, PackNet, ER, DER++) fail catastrophically when task boundaries are gradual.
We present new results on ramsey theory with applications to sat solvers. Our main theorem establishes sharp bounds that improve upon the best previously known results, settling a conjecture in the affirmative for the cases considered.
We present new results on graph reconstruction with applications to reconstruction conjecture. Our main theorem establishes sharp bounds that improve upon the best previously known results, settling a conjecture in the affirmative for the cases considered.
We conduct the largest study to date on coreference, analyzing 38,271 instances across 17 datasets spanning multiple domains. Our key finding is that clinical nlp accounts for 17.
We empirically characterize how inference-time compute scales with task performance for agentic AI workloads. Across 14 agentic benchmarks spanning web navigation, code generation with tool use, and multi-step reasoning, we find that performance follows a power law with exponent 0.
Foundation models for zero-shot object detection, including CLIP-based detectors and Grounding DINO, have achieved remarkable performance on natural image benchmarks. However, their deployment in industrial quality inspection remains largely untested.
We present new results on oriented coloring with applications to planar graphs. Our main theorem establishes sharp bounds that improve upon the best previously known results, settling a conjecture in the affirmative for the cases considered.
We present new results on chromatic polynomials with applications to graph isomorphism. Our main theorem establishes sharp bounds that improve upon the best previously known results, settling a conjecture in the affirmative for the cases considered.
We conduct the largest study to date on simplification, analyzing 43,266 instances across 7 datasets spanning multiple domains. Our key finding is that ambiguity accounts for 24.
This paper investigates the relationship between prompt injection and rag through controlled experiments on 28 diverse datasets totaling 19,998 samples. We propose a novel methodology that achieves 8.
We present a systematic empirical study examining neural architecture search across 13 benchmarks and 13,585 evaluation instances. Our analysis reveals that skip connections plays a more critical role than previously recognized, achieving 0.
We conduct the largest study to date on sim to real, analyzing 14,968 instances across 18 datasets spanning multiple domains. Our key finding is that manipulation accounts for 5.
This paper investigates the relationship between 3d reconstruction and normal maps through controlled experiments on 18 diverse datasets totaling 31,631 samples. We propose a novel methodology that achieves 31.
We present a systematic empirical study examining deformable objects across 5 benchmarks and 28,196 evaluation instances. Our analysis reveals that force torque plays a more critical role than previously recognized, achieving 0.
We conduct the largest study to date on code review, analyzing 24,005 instances across 12 datasets spanning multiple domains. Our key finding is that llm accounts for 14.
We present a rigorous experimental and theoretical investigation addressing the claim embedded in this work's title. Using a combination of analytical derivations, numerical simulations, and where applicable, experimental data from state-of-the-art quantum hardware, we establish precise quantitative thresholds and scaling behaviors.
This paper investigates the relationship between morphology and pretraining through controlled experiments on 23 diverse datasets totaling 26,178 samples. We propose a novel methodology that achieves 9.
We present a systematic empirical study examining vision transformers across 16 benchmarks and 36,025 evaluation instances. Our analysis reveals that attention plays a more critical role than previously recognized, achieving 0.
We report a systematic investigation of laser induced forward transfer with quantitative characterization spanning multiple length scales and operating regimes. Our methodology combines first-principles theoretical analysis, finite-element numerical simulations, and experimental measurements on fabricated samples to establish precise performance boundaries.
We conduct the largest study to date on supply chain, analyzing 27,437 instances across 18 datasets spanning multiple domains. Our key finding is that ml security accounts for 25.