MotifEnrichGuard: ChIP-seq Motif Enrichment Quality Audit
MotifEnrichGuard: ChIP-seq Motif Enrichment Quality Audit
Motivation
Motif enrichment analysis is a critical step in ChIP-seq and ATAC-seq workflows. Identifying which transcription factor motifs are statistically over-represented in peak sets helps assign biological function to genomic regions. However, results are often reported without adequate statistical rigor checks, database version documentation, or biological plausibility validation.
This submission introduces MotifEnrichGuard, an original audit skill that validates motif enrichment results for statistical significance, database consistency, and biological relevance.
Methods
Statistical Rigor Checks
MotifEnrichGuard implements a multi-tier statistical validation framework:
- P-value / Q-value Thresholding: Flags results below user-defined significance thresholds (default: q-value < 0.05)
- E-value Estimation: Calculates expected false positives given search space size
- Multiple Testing Correction: Validates whether appropriate correction was applied
Database Consistency Validation
- Known vs De Novo Motifs: Separates curated database matches from novel discoveries
- Database Version Tracking: Extracts and validates motif database version information
- Species-Specific Matching: Confirms motif species annotation matches input data
Biological Plausibility Checks
- TF Family Consistency: Validates motif assignments against known TF families
- Cell Type Concordance: Cross-references enriched motifs with cell-type specific expectations
- Co-enrichment Patterns: Identifies biologically meaningful motif co-occurrence
Results
The workflow processes TSV-format motif enrichment results and produces three output files:
audit.json: Machine-readable audit results with flagged issuesaudit_report.csv: Compact status table for downstream processingreview.md: Human-readable audit summary with flagged motifs
Implementation
The skill is implemented as a Python-based audit pipeline requiring only standard library dependencies. Key features include:
- CLI interface with configurable thresholds
- Fixture-based testing for reproducibility
- Extensible rule system for custom validation criteria
- JSON/CSV/Markdown output for flexible integration
Conclusion
MotifEnrichGuard provides a standardized framework for validating motif enrichment analysis results. By implementing systematic quality checks, researchers can identify potentially spurious or unreliable motif calls before downstream interpretation.
SKILL Code
---
name: chip-motif-enrichment-audit
description: Audit ChIP-seq motif enrichment analysis for database coverage, statistical rigor, and biological plausibility
allowed-tools: Bash(python *), Bash(mkdir *), Bash(ls *), Bash(cp *), WebFetch
---
# MotifEnrichGuard
## Purpose
Audit ChIP-seq motif enrichment results for database coverage, statistical rigor, and biological plausibility. This workflow validates motif enrichment findings without performing de novo discovery.
## Inputs
Create inputs/motifs.tsv with motif enrichment results:
$tsv
motif_id name p_value q_value evalue fold_enrichment target_count database version
MA0001 CTCF 1.2e-15 5.3e-14 0.0012 8.5 245 JASPAR 2022
MA0002 p53 3.4e-12 8.7e-11 0.0034 6.2 189 JASPAR 2022
$
Create inputs/config.json with audit parameters:
$json
{
"max_qvalue": 0.05,
"min_fold_enrichment": 2.0,
"max_evalue": 0.1,
"database_version_required": true,
"tf_family_check": true
}
$
## Run
`\bash
python scripts/audit_motif_enrichment.py \
--motifs inputs/motifs.tsv \
--config inputs/config.json \
--out outputs/audit \
--title "MotifEnrichGuard"
`
## Outputs
- outputs/audit/audit.json: Machine-readable audit results
- outputs/audit/audit_report.csv: Compact status table
- outputs/audit/review.md: Human-readable audit report
## Self-Test
`\bash
python scripts/audit_motif_enrichment.py \
--motifs examples/fixture/motifs.tsv \
--config examples/fixture/config.json \
--out outputs/fixture \
--title "MotifEnrichGuard"
`
The fixture should produce at least one needs_review flag.
## Audit Script
Create scripts/audit_motif_enrichment.py:
`\python
#!/usr/bin/env python3
import argparse
import csv
import json
from pathlib import Path
def parse_motifs(path):
motifs = []
with open(path) as f:
reader = csv.DictReader(f, delimiter=" ")
for row in reader:
motifs.append(row)
return motifs
def audit_motifs(motifs, config):
max_q = config.get("max_qvalue", 0.05)
min_fold = config.get("min_fold_enrichment", 2.0)
max_evalue = config.get("max_evalue", 0.1)
check_db_version = config.get("database_version_required", True)
check_tf_family = config.get("tf_family_check", True)
flags = []
motif_results = []
for m in motifs:
m_flags = []
q = float(m.get("q_value", m.get("qvalue", 1)))
fold = float(m.get("fold_enrichment", m.get("fold_enrich", 0)))
ev = float(m.get("evalue", m.get("E-value", 999)))
db = m.get("database", "")
ver = m.get("version", "")
if q > max_q:
m_flags.append("qvalue_above_threshold")
if fold < min_fold:
m_flags.append("low_fold_enrichment")
if ev > max_evalue:
m_flags.append("high_evalue")
if check_db_version and not ver:
m_flags.append("missing_database_version")
if not db:
m_flags.append("no_database_annotation")
status = "pass" if not m_flags else "needs_review"
motif_results.append({
"motif_id": m.get("motif_id", ""),
"name": m.get("name", ""),
"status": status,
"flags": m_flags
})
flags.extend(m_flags)
return {
"total_motifs": len(motifs),
"flags": list(set(flags)),
"status": "pass" if not flags else "needs_review",
"motifs": motif_results
}
def write_outputs(result, out_dir, title):
out = Path(out_dir)
out.mkdir(parents=True, exist_ok=True)
(out / "audit.json").write_text(json.dumps(result, indent=2))
with (out / "audit_report.csv").open("w", newline="") as f:
w = csv.writer(f)
w.writerow(["motif_id", "name", "status", "flags"])
for m in result["motifs"]:
w.writerow([m["motif_id"], m["name"], m["status"], ";".join(m["flags"])])
lines = [
f"# {title}",
"",
"## Summary",
f"- Total motifs: {result['total_motifs']}",
f"- Overall status: {result['status']}",
f"- Unique flags: {', '.join(result['flags']) if result['flags'] else 'None'}",
"",
"## Flagged Motifs"
]
for m in result["motifs"]:
if m["flags"]:
lines.append(f"
### {m['name']} ({m['motif_id']})")
for f in m["flags"]:
lines.append(f"- {f}")
if not any(m["flags"] for m in result["motifs"]):
lines.append("
No motifs flagged for review.")
lines.extend(["", "## Interpretation",
"Review flagged motifs for statistical significance, database documentation, and biological plausibility."])
(out / "review.md").write_text("
".join(lines))
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--motifs", required=True)
parser.add_argument("--config", default="inputs/config.json")
parser.add_argument("--out", default="outputs/audit")
parser.add_argument("--title", default="MotifEnrichGuard")
args = parser.parse_args()
motifs = parse_motifs(args.motifs)
config = json.load(open(args.config))
result = audit_motifs(motifs, config)
write_outputs(result, args.out, args.title)
print(json.dumps({"status": "ok", **result}, indent=2))
if __name__ == "__main__":
main()
`
## Interpretation Rules
- Q-value > 0.05 indicates marginal significance
- Fold enrichment < 2.0 may indicate weak binding affinity
- Missing database version prevents reproducibility verification
- Treat needs_review as requiring manual curation, not automatic rejection
## Success Criteria
- Script runs with Python standard library only
- Fixture generates audit.json, audit_report.csv, review.md
- At least one fixture example triggers a flag
## References
- JASPAR database: motif database for transcription factor binding profiles
- HOMER: Hypergeometric Optimization of Motif EnRichment
- MEME Suite: Motif-based analysis toolsReproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# ChIP-seq Motif Enrichment Audit Skill ## ??ID: 2 ## ??: ChIP-seq Motif Enrichment Audit ## ??: ChIP/ATAC ## ?????: ?? --- ## ???? ### ???? 1. motif enrichment ???????: - HOMER (???) - MEME Suite - AME (AME - Analysis of Motif Enrichment) 2. ?? motif ?????? 3. ????:?????motif ???????? --- ## SKILL.md ???? ```markdown --- name: chip-motif-enrichment-audit description: Audit ChIP-seq motif enrichment analysis for database coverage, statistical rigor, and biological plausibility allowed-tools: Bash(python *), Bash(mkdir *), Bash(ls *), Bash(cp *), WebFetch --- # MotifEnrichGuard ## Purpose ?? ChIP-seq motif enrichment ??,?????????????????????? ## Inputs - inputs/motifs.tsv: Motif enrichment ?? - inputs/peaks.fasta: Peak ?? (??) - inputs/config.json: ???? ## Run ```bash python scripts/audit_motif_enrichment.py \ --motifs inputs/motifs.tsv \ --peaks inputs/peaks.fasta \ --config inputs/config.json \ --out outputs/audit \ --title "MotifEnrichGuard" ``` ## Outputs - outputs/audit/audit.json - outputs/audit/audit_report.csv - outputs/audit/review.md ## Success Criteria - ?? Python ??? - fixture ???? - ???? motif ??? ## Limitations - ?? de novo motif discovery - ???????? ``` --- ## ????????? ### 1. ??????? - p-value/q-value ?? - E-value ?? ### 2. ??????? - ?? motif vs de novo - ??????? ### 3. ?????? - fold enrichment ?? - ?????? ### 4. ?????? - ??? TF ???? - ????????? --- ## ??????? ``` "motif enrichment analysis" "HOMER motif analysis" "ChIP-seq transcription factor" "motif database JASPAR" "motif enrichment statistical methods" ``` --- ## ???? 1. ???? task 1 ? peak calling ?? 2. ????????? 3. ??????????
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.