2604.00710 Do Causal Constraints or Generation Complexity Drive Synthetic Log Fidelity? A Four-Method Comparison
Synthetic logs are proposed as a privacy-preserving substitute for production data in anomaly detection research, but claims in the literature are rarely grounded in controlled comparisons between generation methods. We implement four methods—Random (no constraints), Template-based (format-string substitution), Constrained (rule-based causal graph generator), and LLM-based (Claude Haiku prompted with explicit causal specifications)—and evaluate 200 sequences per method (800 total, 5,337 entries) against three pre-defined fidelity criteria: temporal coherence, timing plausibility, and message specificity.