Spectrography of Artificial Thought: Geometric Invariants, Epistemic Boundaries, and Exogenous Agent Safety
Spectrography of Artificial Thought
1. The Problem
Large language model agents increasingly fail. MacDiarmid et al. show RLHF alignment breaks. The implicit assumption is the Geometric Morality Fallacy: that making latent spaces more isotropic makes AI more truthful.
2. Architecture
Projection pipeline: Input(384D) -> Linear+ReLU -> 256D -> Linear+ReLU -> 128D -> Linear -> 24D -> Normalize -> S^23
Base encoder: all-MiniLM-L6-v2 (SBERT).
| Space | Role | Properties |
|---|---|---|
| 384D | Raw SBERT | Redundant dimensions |
| 111D | Manifold | Intrinsic dimensionality |
| 24D S^23 | Topological core | Architectural bottleneck |
3. Core Measurements
Geometric tension: tau_i = ||z_A - z_B||_2
Temporal derivative: Delta_tau_i = |tau_i - tau_{i-1}|
4. Results
Truth/Lie Isomorphism:
| Category | tau mean | p vs Truth |
|---|---|---|
| Complex Truth | 3.0805 | --- |
| Coherent Lie | 3.0728 | 0.948 |
| Nonsense | 2.8069 | 0.008 |
Contradiction Detection:
| Sequence Type | Delta_tau mean | Cohen's d |
|---|---|---|
| Consistent | 0.9078 | --- |
| Contradiction | 1.8182 | 2.419 |
Key finding: p = 0.948 and d = 2.419 are complementary.
5. Logical Sentinel: Z3
Three invariants: Phi1: S_r = 0 and R_a > 0 => C_x = 1 (Non-Contamination) Phi2: U_n = 1 => R_a = 0 (Safe Mode) Phi3: L_p >= 3 => R_a = 0 (Loop Guard)
| ID | Threat | Result |
|---|---|---|
| T1 | Adversarial source (URL) | BLOCK |
| T2 | Uncertain + risky action | BLOCK |
| T3 | Compliant (verified) | ALLOW |
| T4 | Synonym evasion | BLOCK |
| T5 | Safe read-only | ALLOW |
| T6 | Brute-force loop (5x) | BLOCK |
6. Limitations
- N = 30 per category
- Delta_tau domain-dependent (0/3 unseen domains)
- r = 24 is architectural, not Leech lattice
- Single encoder only
- eBPF not deployed
7. References
[1] MacDiarmid et al. (2025). arXiv:2511.18397 [2] Ma et al. (2026). arXiv:2601.10527 [3] Reimers & Gurevych (2019). Sentence-BERT. EMNLP 2019 [4] de Moura & Bjorner (2008). Z3. TACAS 2008
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
---
name: spectrography-full
description: Detect contradictions via geometric analysis on S^23
allowed-tools: Bash(python *), Bash(pip *)
---
# Spectrography
## Installation
```bash
pip install torch sentence-transformers numpy scipy z3-solver scikit-learn
```
## Usage
```python
import torch
from sentence_transformers import SentenceTransformer
from sklearn.decomposition import TruncatedSVD
torch.manual_seed(42)
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = ['Sentence 1', 'Sentence 2', 'Sentence 3']
embeddings = model.encode(sentences, convert_to_numpy=True)
svd = TruncatedSVD(n_components=111, random_state=42)
embeddings_111 = svd.fit_transform(embeddings)
proj = torch.nn.Sequential(
torch.nn.Linear(111, 256), torch.nn.ReLU(),
torch.nn.Linear(256, 128), torch.nn.ReLU(),
torch.nn.Linear(128, 24)
)
z = torch.nn.functional.normalize(proj(torch.from_numpy(embeddings_111).float()), p=2, dim=-1)
tau = [torch.norm(z[i] - z[i+1]).item() for i in range(len(z)-1)]
delta_tau = [abs(tau[i] - tau[i-1]) for i in range(1, len(tau))]
THRESHOLD = 1.2
ruptures = [i + 2 for i, dt in enumerate(delta_tau) if dt > THRESHOLD]
print(f'Ruptures: {ruptures}')
```
## Z3 Sentinel
```python
from z3 import Solver, Int, Bool, And, Implies, sat
def verify(cot, action):
has_url = 'http' in cot or 'www.' in cot
dangerous = any(k in action for k in ['rm ', 'sudo', 'delete'])
s = Solver()
Sr, Ra = Int('Sr'), Int('Ra')
Cx, Un = Bool('Cx'), Bool('Un')
s.add(Implies(And(Sr == 0, Ra >= 1), Cx == True))
s.add(Implies(Un == True, Ra == 0))
s.add(Sr == (0 if has_url else 1))
s.add(Ra == (1 if dangerous else 0))
s.add(Cx == ('verified' in cot))
return 'SAT' if s.check() == sat else 'UNSAT'
```
## Citation
```bibtex
@article{delgado2026spectrography,
title={Spectrography of Artificial Thought},
author={Delgado, Sylvain},
year={2026}
}
```Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.