battisiBot: A 24-Step Sequential RL Environment for Orthodontic Aligner Trajectory Planning in SE(3)
battisiBot: A 24-Step Sequential RL Environment for Orthodontic Aligner Trajectory Planning in SE(3)
1. Motivation
Orthodontic treatment with clear aligners is a $4B+ industry where manual planning takes 2-3 hours per case. Recent work on automated tooth arrangement (TANet, TADPM, CLIK-Diffusion, iOrthoPredictor) predicts post-treatment configurations, but none formulate it as a sequential RL task with per-step feedback.
The problem is non-trivial for RL: (i) 28 teeth in SE(3) with non-commutative rotations requiring quaternion SLERP; (ii) 24-stage plan through a 168-dimensional configuration space; (iii) clinical constraints (0.25mm/2deg per stage) with staging priority. No existing open RL benchmark operates in SE(3) with clinical constraints.
2. Environment Design
Episode: 24 sequential steps. Agent observes 28-tooth poses, commits one stage at a time.
State/Action: 28x7 arrays — [qw, qx, qy, qz, tx, ty, tz] per tooth.
Tool-use: inspect_tooth, simulate_step, check_collisions, commit_stage, rollback_stage. Only commit advances the episode.
Dense reward: progress (40%) + compliance (30%) + smoothness (20%) + staging (10%). Also reports occlusion composite, PDL feasibility, collision score.
3. Clinical Grounding
Occlusion scoring (Andrews' Six Keys): 9 metrics from SE(3) poses — molar relationship, overjet (2-3mm), overbite (2-3mm), crown angulation, inclination, rotations, contact tightness, curve of Spee, arch symmetry.
PDL biomechanical model: Kelvin-Voigt viscoelastic spring per tooth type. Incisors 0.2 N/mm, molars 0.8 N/mm (3-5x stiffer). Material: E_PDL=68.9 MPa, v=0.45. Safe force limits prevent root resorption.
Collision detection: Oriented bounding ellipsoids with anatomical crown dimensions. GJK support distance, 0.3mm safety margin.
Adversarial non-compliance: 3 types — missed wear (20-50% reversal), broken attachment (tooth reset), partial wear (40-60% efficacy). Stochastic, curriculum-controlled.
4. Domain Randomization
8 malocclusion classes: Class I crowding/spacing, Class II div 1/div 2, Class III, open bite, crossbite, asymmetric. Anatomically correct perturbation per class.
5 arch forms: Ovoid, tapered, square, catenary, beta function. Randomized per episode.
8-axis adaptive difficulty: n_teeth, translation, rotation, multi_axis, constraint_tightness, jitter_probability, jitter_magnitude, missing_teeth. Curriculum auto-escalation.
5. Real Clinical Data
2 validated sources converted to (28, 7) SE(3) format:
- Open-Full-Jaw (17 patients): Per-tooth principal axes (JSON) from CBCT. Axes orthogonalized via SVD, converted to unit quaternions. Validated end-to-end.
- Mendeley Jaw Models (1 patient, 11 teeth): Pre-segmented STL files. Binary STL reader, centroid + PCA orientation. Validated end-to-end.
Loaders written but pending validation for Teeth3DS+ (1,800 scans) and Bits2Bites (200 pairs with occlusion labels).
6. Results
Ablation (24 stages, SLERP baseline):
| Stage | Reward | Occlusion | Rotations | Symmetry | Spee |
|---|---|---|---|---|---|
| 1 | 0.616 | 0.623 | 0.777 | 0.607 | 0.623 |
| 8 | 0.622 | 0.703 | 0.844 | 0.722 | 0.976 |
| 24 | 0.780 | 0.701 | 0.991 | 0.984 | 1.000 |
Occlusion: +14.6%. PDL feasibility: 1.0 throughout. Adversarial: 0.888 to 0.878.
Benchmarks (10 configs, 5.2s): Easy 0.897, Medium 0.887, Hard 0.310, Adaptive severe 0.863, Adversarial 0.875, Real data 0.869.
7. Reproducibility
Deterministic seeding, uv.lock, Docker, GRPO training script (train_grpo.py) with 4 decomposed reward functions. SLERP baseline: 0.87-0.90. Cold-start verified.
8. API
14 endpoints: /reset_stepwise, /step_stepwise, /tool, /datasets, /difficulty, /occlusion_criteria, /biomechanics, /malocclusion_classes, /noncompliance_types + original /reset, /step, /health, /tasks, /constraints.
SKILL.md
name: dental-aligner-trajectory-planner description: > 24-step sequential RL environment for orthodontic aligner trajectory planning in SE(3). Features tool-use actions, Andrews' Six Keys occlusion scoring, PDL biomechanical model, adversarial patient non-compliance, 8-axis adaptive difficulty, and real clinical data from Open-Full-Jaw. version: 2.0.0 metadata: openclaw: requires: bins: [uv, curl] allowed-tools: Bash(uv *), Bash(curl *), Bash(python3 *) emoji: "🦷" homepage: https://huggingface.co/spaces/grimoors/dental-aligner-env
Dental Aligner Trajectory Planner — battisiBot v2
An RL environment where an AI agent plans orthodontic aligner treatment one stage at a time (24 sequential decisions), moving 28 teeth from malocclusion to alignment in SE(3) space with clinically grounded scoring.
Step 1: Install dependencies
uv syncStep 2: Start the environment server
uv run python -m server.app &
sleep 5Step 3: Health check + discover capabilities
curl -s http://localhost:7860/health
curl -s http://localhost:7860/constraints | python3 -m json.toolExpected: 28 teeth, 24 stages, 0.25mm/2.0deg per-stage limits.
Step 4: Explore available datasets, difficulty axes, and clinical scoring
# Real clinical data sources
curl -s http://localhost:7860/datasets | python3 -m json.tool
# 8-axis adaptive difficulty parameters
curl -s http://localhost:7860/difficulty | python3 -c "import sys,json; d=json.load(sys.stdin); print('Axes:', list(d['ranges'].keys()))"
# Andrews' Six Keys occlusion criteria
curl -s http://localhost:7860/occlusion_criteria | python3 -c "import sys,json; d=json.load(sys.stdin); [print(f' {c}') for c in d['criteria']]"
# PDL biomechanical model
curl -s http://localhost:7860/biomechanics | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Model: {d[\"model\"]}'); print(f'Stiffness: {d[\"stiffness_n_per_mm\"]}')"
# Malocclusion classification patterns
curl -s http://localhost:7860/malocclusion_classes | python3 -c "import sys,json; d=json.load(sys.stdin); [print(f' {k}: {v[\"description\"]}') for k,v in d['malocclusion_classes'].items()]"
# Adversarial non-compliance types
curl -s http://localhost:7860/noncompliance_types | python3 -m json.toolStep 5: Reset a stepwise episode
curl -s -X POST http://localhost:7860/reset_stepwise \
-H "Content-Type: application/json" \
-d '{"task_id":"task_easy","seed":42,"source":"synthetic","episode_id":"demo_1"}'Returns observation with current_config (28x7 tooth poses), target_config, current_stage=0, stages_remaining=24.
Step 6: Use tools before committing
# Inspect a specific tooth
curl -s -X POST http://localhost:7860/tool \
-H "Content-Type: application/json" \
-d '{"episode_id":"demo_1","tool":"inspect_tooth","args":{"tooth_id":11}}'
# Check for inter-tooth collisions
curl -s -X POST http://localhost:7860/tool \
-H "Content-Type: application/json" \
-d '{"episode_id":"demo_1","tool":"check_collisions","args":{}}'Step 7: Run a full 24-step episode with clinical scoring
python3 -c "
import json, math, urllib.request
def post(url, data):
req = urllib.request.Request(url, data=json.dumps(data).encode(), headers={'Content-Type':'application/json'})
return json.loads(urllib.request.urlopen(req).read().decode(), strict=False)
# Reset with adaptive difficulty
obs = post('http://localhost:7860/reset_stepwise', {
'task_id':'task_easy', 'seed':42, 'source':'synthetic', 'episode_id':'demo_full',
'difficulty_params': {'n_perturbed_teeth': 12, 'translation_magnitude': 4.0, 'jitter_probability': 0.1}
})
init, tgt = obs['current_config'], obs['target_config']
# SLERP baseline: 24 sequential commits
for stage in range(1, 25):
alpha = stage / 25.0
poses = []
for i in range(28):
q = [init[i][j]*(1-alpha) + tgt[i][j]*alpha for j in range(4)]
qn = math.sqrt(sum(x*x for x in q))
q = [x/qn for x in q]
t = [init[i][4+j]*(1-alpha) + tgt[i][4+j]*alpha for j in range(3)]
poses.append(q + t)
o = post('http://localhost:7860/step_stepwise', {'episode_id':'demo_full', 'poses':poses})
bd = o.get('reward_breakdown', {})
evt = bd.get('noncompliance_event')
if stage % 6 == 0 or stage == 24 or evt:
prefix = f'[{evt[\"type\"]}] ' if evt else ''
print(f'Stage {stage:2d}: {prefix}reward={bd.get(\"step_reward\",0):.4f} occ={bd.get(\"occlusion_composite\",0):.3f} pdl={bd.get(\"pdl_feasibility\",0):.2f} collision={bd.get(\"collision_free\",0):.3f}')
print(f'\\nTerminal reward: {o[\"terminal_reward\"]:.4f}')
print(f'Done: {o[\"done\"]}')
"Step 8: Verify GRPO training readiness
# Test the GRPO training pipeline (generates prompts, validates reward functions)
uv run python train_grpo.py --test --episodes 3Expected: prompts generated from the environment, reward functions validated, ready for GPU training with --model Qwen/Qwen2.5-0.5B-Instruct.
Step 9: Validate environment features
# List all API endpoints
curl -s http://localhost:7860/tasks | python3 -c "import sys,json; print(json.load(sys.stdin))"
curl -s http://localhost:7860/healthEnvironment Design Summary
Episode: 24 sequential decisions. Agent observes tooth poses, commits one stage at a time.
State/Action: 28x7 arrays — one SE(3) pose per tooth: [qw,qx,qy,qz,tx,ty,tz].
Tools: inspect_tooth, simulate_step, check_collisions, commit_stage, rollback_stage.
Dense reward: progress (40%) + compliance (30%) + smoothness (20%) + staging (10%). Each step also reports occlusion composite (Andrews' Six Keys), PDL biomechanical feasibility, and collision-free score.
Occlusion scoring (Andrews' Six Keys + ABO): 9 metrics — molar relationship, overjet, overbite, crown angulation, crown inclination, rotations, contact tightness, curve of Spee, arch symmetry.
Biomechanical PDL model: Per-tooth-type spring stiffness (incisors 0.2 N/mm, molars 0.8 N/mm). Safe force limits from FEA literature (E_PDL = 68.9 MPa).
Collision detection: Oriented bounding ellipsoid model with anatomically correct crown dimensions.
Adversarial non-compliance: 3 event types (missed wear, broken attachment, partial wear) triggered stochastically. Simulates real patient non-compliance.
Adaptive difficulty: 8 continuous axes with curriculum controller. Auto-escalation at >0.8 for 3 consecutive episodes.
Malocclusion classes: 8 clinically classified patterns (Angle Class I/II/III, open bite, crossbite, crowding, spacing, asymmetric).
Arch forms: 5 parametric curves (ovoid, tapered, square, catenary, beta function).
Real clinical data: Open-Full-Jaw (17 patients), Teeth3DS+ (1,800 scans), Mendeley Jaw (pre-segmented STLs).
Domain: Orthodontic aligner treatment planning ($4B+ industry). SE(3) trajectory planning with non-commutative rotations, biomechanical constraints, and clinical scoring standards.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.