Sutra: A Programming Language for Vector-Symbolic Computation in Frozen Embedding Spaces
Sutra: A Programming Language for Vector-Symbolic Computation in Frozen Embedding Spaces
Emma Leonhart — EmmaLeonhart999@gmail.com
Abstract
Frozen general-purpose language-model embedding spaces encode
relational structure as vector arithmetic — a property established
across the knowledge-graph-embedding literature (TransE, RotatE,
the word-analogy line). Taking that as given, this paper presents
the design and implementation of Sutra, a typed, purely
functional programming language whose compile target is a single
tensor-op graph over a frozen LLM embedding substrate. The
contribution is algorithmic: a consolidated set of vector-symbolic
primitives (bind, unbind, bundle, similarity, rotation,
soft-halt RNN cells) that work on natural anisotropic embedding
spaces where the textbook Hadamard-product VSA fails, plus a
compiler that lowers the whole program to one fused tensor-op
graph. Sutra is a working compiler today: parser, type checker,
codegen, runtime; the example corpus is a smoke test of 13
demonstration programs covering hello-world embedding round-trips,
fuzzy dispatch, role-filler records, knowledge graphs, classifier
decision rules, sequence reduction, naive analogy, predicate
lookup, nearest-phrase retrieval, the imperative-reversible
pattern, the do-while adder, the rotation hashmap, the rotation
record, and a tutorial — all executing end-to-end with expected
outputs. The full examples/ directory holds 23 .su files
including legacy and feature demos. We give an honest account of
which parts of the substrate-purity story are shipped and which
remain.
1. Introduction
The discovery that general-purpose language model embeddings
encode relational structure as vector arithmetic — king − man + woman ≈ queen, formalized through TransE, RotatE, and the
broader knowledge-graph embedding literature — established that
there is genuine algebraic content in the geometry of pre-trained
models. Given that algebraic structure exists, two questions
follow:
- Which operations on these embeddings are reliable enough to be used as primitives of a compositional algebra over the embedding space, rather than as one-off lexical facts?
- What is the correct binding operation to compose those primitives into structured representations — i.e. how do we build a working vector-symbolic architecture (VSA) on top of substrates the standard VSA literature was not designed for?
This paper answers both questions in the form of a working programming language, Sutra, whose primitives are exactly these consolidated operations.
The naming: Sutra is the Sanskrit sūtra — thread, rule, aphorism — the term for Pāṇini's foundational Sanskrit grammar.
1.1 Two contributions
This paper presents two contributions:
- Consolidation of the algebraic structure of frozen embedding spaces into canonical primitive forms that can be composed: bind, unbind, bundle, similarity, rotation, soft-halt RNN cells.
- A programming language whose compile target is a single tensor-op graph over those primitives — the algorithms above, realized as a typed, purely functional language with a working compiler and runtime.
Sign-flip binding is not the headline — it is at most a side note explaining why the textbook VSA choice (Hadamard product) fails on anisotropic embeddings. The headline is the consolidation into a working algebra plus the language that operationalizes it.
1.2 Contributions
The four core technical contributions of this paper are:
Polynomial fuzzy logic via Lagrange interpolation of Kleene's three-valued truth tables, with functional completeness. The truth axis encodes three values: T = +1, U = 0, F = −1. The logical connectives are taken from Kleene's strong three-valued logic (Kleene 1952): on the discrete grid {−1, 0, +1}, AND is the minimum of its operands, OR is the maximum, NOT is negation. These operations are correct as stated, but
minandmaxare piecewise-linear and non-differentiable at the diagonala = b, which breaks gradient flow when the connectives compose with the rest of the tensor-op graph. Sutra resolves this by Lagrange- interpolating each operator's truth table as a polynomial that is exact on the {−1, 0, +1}² grid and C^∞ everywhere else. The closed forms are:AND(a, b) = (a + b + ab − a² − b² + a²b²) / 2,OR(a, b) = (a + b − ab + a² + b² − a²b²) / 2, andNOT(x) = −x(already polynomial). By functional completeness of {AND, OR, NOT} for three-valued logic, every other connective (XOR, IMPLIES, NAND, NOR, …) lowers to a composition of these three polynomials. The result is that&&,||,!, and any derived connective are all polynomial tensor-op-graph fragments — gradient-compatible, branchless, and exact on the discrete-logic regime; the differentiability is the property that lets fuzzy logic compose with the rest of the substrate-pure runtime.Beta reduction to tensor normal form, used as the compiler architecture. Sutra inverts what conventional compilers do: instead of progressively lowering a high-level program toward machine instructions, the compiler aggressively expands the program — inlining operator definitions, unfolding constants, beta-reducing through bound names — until the residual is a straight-line algebraic expression over the VSA primitives. That residual is then algebraically reduced to tensor normal form: a fused sequence of matmul / element-wise / nonlinear tensor ops with no remaining named bindings or function calls. In the recurrent case the form generalizes to recurrent tensor normal form, where the RNN cell body is itself in tensor normal form and the recurrence is a separate top-level operator.
Tail recursion as the loop primitive, eliminating control flow, with O(1) memory in recursion depth. Loops are not
for/whileconstructs over a host-side iterator. They are tail-recursive function declarations (do_while,while_loop,iterative_loop,foreach_loop) whose body'sreturn NAME(args)becomes the recurrent step. Each loop compiles to a fixed-T soft-halt RNN cell with substrate-pure halt detection (heaviside step → cumulative monotone halt → soft-mux state freeze). The state vector h_t carries the entire execution context in superposition over a fixed-width vector, so memory overhead is constant in recursion depth: a Sutra program can specify deeper recurrence (a larger T at compile time, §1.2 manifest setting) without expanding the runtime memory budget. There is no per-iteration stack frame, no growing context, no heap allocation keyed by depth — the loop body updates the same state tensor T times. Halt completion propagates through nested calls to the program's final output: a loop that fails to converge wipes the program's result.Synthetic-dimension rotation binding as an angular hash map. The compiler maps a high-dimensional codebook onto a set of reserved synthetic dimensions and uses Haar-random orthogonal rotations (seeded from the role's content hash) to bind keys to slots. This is, to the authors' knowledge, the first use of a high-dimensional rotation pattern as the substrate for a functional hash-map primitive. After binding, the resulting structure participates in the same beta-reduction pass as the rest of the program and is reduced to (recurrent) tensor normal form alongside everything else.
These four primitives are integrated into a single working
compiler that lowers .su source to a self-contained PyTorch
module and runs on CPU or CUDA. The compile-time loop unroll
depth T is a per-project configuration field
([project.compile] loop_max_iterations in the project's
atman.toml manifest, §3.5; equivalently, the --loop-T CLI
flag); the default is T=50, and programs that need deeper
recursion compile with a larger T at no runtime cost beyond the
longer emitted graph (the soft-halt cell freezes state once
halt_cum saturates).
In addition to the four technical contributions above, this paper also reports an engineering / execution result:
- End-to-end string I/O through the substrate, via a
compile-time codebook + nearest-string decode. Every embedded
string in a
.suprogram is embedded once at compile time via the project's configured frozen LLM and stored in an embedded codebook store alongside its label. At runtime, the inverse operationnearest_string(vector)returns the label whose embedding is closest to the queried vector. The frozen LLM is load-bearing for this design: a deterministic, reproducible, dense-enough string-to-vector map is what makes the codebook practical and the inverse decode reliable. Replacing the embedding with the random hypervectors that classical VSA literature assumes would still yield a working algebra but would leave the language with no I/O story — strings would have no canonical mapping to vectors and the substrate would have nowhere to decode labels from. To the authors' knowledge, Sutra is therefore the only HDC implementation that ships a practical end-to-end string-in / string-out path as a built-in compiler concern. Existing HDC libraries (TorchHD and similar) expose the algebra over user-supplied hypervectors but require users to maintain their own string-to-vector mapping and codebook by hand; that boilerplate is what makes most HDC code stay research-tooling-shaped rather than program-shaped. This is not a new theoretical primitive but a working integration: the compiler, the runtime, the embedded codebook, and 13 demonstration programs in the smoke test (with 23.sufiles in theexamples/directory) exercise the end-to-end pipeline.
1.3 What this paper is not
This paper is not a survey of VSA binding operations; the contribution is not a new binding scheme in isolation, but the integration of the four primitives in §1.2 into a single typed, purely functional language with a working compiler. The soft-halt RNN cell is straightforward in the abstract; what is not straightforward is making it the loop primitive of a programming language whose entire program lowers to one tensor-op graph through beta reduction. The paper is neither a deep-learning architecture paper nor a pure programming-language theory paper; it is the specific construction that ties the two together.
2. Related Work
2.1 Vector Symbolic Architectures
VSA is a family of algebraic frameworks for computing with high-
dimensional vectors (Kanerva 2009; Plate 1995; Gayler 2003). The
standard VSA development assumes hypervectors drawn from a
controlled random distribution designed for the algebra; bind is
typically Hadamard product or circular convolution. Frozen LLM
embedding spaces are not designed for VSA — they are correlated
and anisotropic — and the textbook bind operations do not transfer
cleanly. Rotation binding (R_role @ filler for a role-seeded
Haar-random orthogonal R_role) does, and is what Sutra uses
today.
The closest software peer in the VSA space is TorchHD (Heddes et al. 2023), a PyTorch library that exposes VSA primitives (bind, bundle, similarity) as tensor operations. Sutra and TorchHD differ on what the user writes and what the compiler does:
- TorchHD is a library. The user writes Python code that calls TorchHD primitives; control flow is host-side Python; there is no source-language layer above the primitives, no compile step, and no algebraic reduction across primitive calls. Each primitive call is a tensor op, but the program itself is a Python function with whatever control flow the user wrote.
- Sutra is a language with a compiler. The user writes
.susource which the compiler beta-reduces to tensor normal form (§1.2-2): a single straight-line tensor-op graph with no Python control flow. Loops are tail-recursive function declarations that lower to soft-halt RNN cells; conditionals are differentiable fuzzy interpolations rather than Pythonif. Hash-map structure is implemented via synthetic-dimension rotation, not via a host-side dictionary.
This is not a "TorchHD is bad" claim; TorchHD is the right tool for using VSA primitives as a library in a Python program. Sutra is the construction that compiles a separate source language to the same primitive set with no host-side residue, which TorchHD is not designed to do.
A second axis on which the two systems differ, and where to the
authors' knowledge Sutra is uniquely positioned within the broader
HDC ecosystem, is string I/O. TorchHD and other HDC libraries
expose the algebra over user-supplied hypervectors: the user
constructs random or hash-derived vectors for whatever they want
to represent, maintains a dict[str, hypervector] mapping by
hand, and decodes by cosine similarity against a manually
assembled codebook tensor. There is no built-in path from external
strings into the substrate or from the substrate back to strings.
Sutra's compile-time codebook (§3.4) closes that loop: every
embedded string in .su source is embedded once at compile time
via the configured frozen LLM (e.g. nomic-embed-text, 768-d) and
stored in the project's .sdb codebook, and the runtime
nearest_string operation is the inverse — given any vector, it
returns the nearest known label. The frozen LLM embedding is
load-bearing for this: it is what gives the compile-time codebook
a deterministic, reproducible, and dense-enough mapping for
nearest-neighbor decode to be practical. Replacing the embedding
with random hypervectors would still yield a working VSA algebra
but would have no I/O story — strings would have no canonical
mapping to vectors and decoding would have nowhere to look up
labels. To the authors' knowledge, Sutra is the only HDC
implementation that ships an end-to-end string-in / string-out
path as a built-in compiler concern rather than as user-supplied
boilerplate.
A side-by-side comparison concretizes the difference. The same role-filler-record task — encode a 3-field record (name, color, shape) as a single bundled vector, then decode the color field — written in both systems:
Sutra (examples/role_filler_record.su, the entire program):
vector r_name = basis_vector("role_name");
vector r_color = basis_vector("role_color");
vector r_shape = basis_vector("role_shape");
vector f_alice = basis_vector("filler_alice");
vector f_red = basis_vector("filler_red");
vector f_circle = basis_vector("filler_circle");
// (... three more fillers omitted ...)
map<vector, string> FILLER_NAME = {
f_alice: "alice", f_red: "red", f_circle: "circle",
/* ... */
};
function vector make_record(vector name, vector color, vector shape) {
return bundle(
bind(r_name, name), bind(r_color, color), bind(r_shape, shape)
);
}
function string decode_field(vector record, vector role) {
vector recovered = unbind(role, record);
vector winner = argmax_cosine(recovered,
[f_alice, f_red, f_circle, /* ... */]);
return FILLER_NAME[winner];
}
function string main() {
vector rec = make_record(f_alice, f_red, f_circle);
return decode_field(rec, r_color);
}The compiler reduces this whole program to a fused tensor-op
graph: every basis_vector call is resolved at compile time
(strings embedded into the substrate, stored in the compile-time
codebook); bind and unbind lower to a single matmul each;
argmax_cosine lowers to one cosine-similarity matmul plus an
argmax; the FILLER_NAME map lowers to the substrate-resident
codebook. The runtime decodes by nearest_string against the
embedded codebook — the string "red" comes out without the
program ever leaving the tensor graph at the program-semantics
level.
TorchHD equivalent (experiments/role_filler_record_torchhd.py,
abridged):
import torch, torchhd
torch.manual_seed(42)
# 1. MANUAL hypervector creation. There is no "embed string";
# the user maintains the string-to-vector mapping.
roles = {n: torchhd.random(1, 768, vsa="MAP")
for n in ["name", "color", "shape"]}
fillers = {n: torchhd.random(1, 768, vsa="MAP")
for n in ["alice", "bob", "red", "blue", "circle", "square"]}
# 2. MANUAL codebook tensor for decoding.
filler_names = ["alice", "bob", "red", "blue", "circle", "square"]
codebook = torch.cat([fillers[n] for n in filler_names], dim=0)
# 3. Build the record (Python control flow).
record = torchhd.bundle(
torchhd.bind(roles["name"], fillers["alice"]),
torchhd.bundle(
torchhd.bind(roles["color"], fillers["red"]),
torchhd.bind(roles["shape"], fillers["circle"]),
),
)
# 4. Decode (Python control flow).
recovered = torchhd.bind(record, torchhd.inverse(roles["color"]))
sims = torchhd.cosine_similarity(recovered, codebook)
result = filler_names[int(torch.argmax(sims))]Both programs return "red". The differences are structural:
- The Sutra program contains no Python; the TorchHD program is Python with library calls.
- The Sutra string-to-vector mapping is automatic via
basis_vector("filler_alice"); in TorchHD the user constructs hypervectors and maintains adict[str, hypervector]by hand. - The Sutra codebook is implicit (the compiler constructs it from the literals in the source); in TorchHD the user stacks vectors into a codebook tensor explicitly.
- The Sutra program lowers to one tensor-op graph; the TorchHD program is a Python function whose control flow stays in Python even after the library calls dispatch to PyTorch.
These are differences in what kind of artifact the user writes, not in which library is faster. The CUDA kernels both systems eventually call into are largely the same — it's the shape of the program before it hits CUDA that differs.
2.2 Comparison to other neuro-symbolic languages
The closest neuro-symbolic-language peer is Scallop (Li et
al. 2023), a Datalog-based language with PyTorch bindings whose
differentiability comes from an extended provenance-semiring
framework over relational queries. Scallop's architectural shape
is a two-stage pipeline: a neural model M_θ extracts discrete
symbols r from raw input, and a Datalog program P performs
logical reasoning over those symbols to produce the output. The
boundary between perception and reasoning is sharp; the symbols
that flow between them are typed relations.
Sutra's shape is different at the same architectural level. There
is no perception-then-reasoning split: the substrate is a
continuous embedding space throughout, and primitives like
bind, unbind, bundle, and similarity operate on vectors
end-to-end. There is no discrete symbolic layer to extract into
or reason over. The whole program — including what would in
Scallop be the logic program — compiles to a single fused
tensor-op graph through beta reduction (§1.2-2). Differentiability
is inherited from the tensor-op graph itself; there are no
provenance semirings because there is no relational layer to
annotate.
The two systems are good at different things. Scallop is the right tool when an application's problem structure is naturally relational — scene-graph queries, knowledge-graph reasoning, combinatorial search over typed entities — and the perception side can be cleanly factored out into a separate neural module. Sutra is the right tool when computation is best expressed as algebra on vectors and the substrate is a frozen LLM embedding space the program reads strings into and decodes strings out of. Neither subsumes the other; they answer different "what kind of program does the user want to write?" questions.
The other named neuro-symbolic peers — DeepProbLog (Manhaeve et al. 2018), Logic Tensor Networks (Serafini & Garcez 2016; Badreddine et al. 2022), and NeurASP (Yang et al. 2020) — share Scallop's perception-then-reasoning shape and differ similarly from Sutra. DeepProbLog grounds neural predicates in a ProbLog proof tree; LTN compiles first-order-logic formulas into differentiable t-norm losses over learned embeddings; NeurASP extends Answer Set Programming with neural predicates. All three treat symbols as a separate stratum from the neural layer.
The HDC-side comparison is sparser. The closest HDC peer with compiler infrastructure is HDCC (Vergés et al. 2023), which translates a description-file DSL into self-contained C for embedded classification. HDCC ships random and level hypervectors only (no LLM substrate), supports no general control flow (no loops, no recursion, no conditionals beyond the encode-then-classify pipeline), and is scoped to classification rather than general-purpose programming. The TorchHD library and OpenHD / HDTorch frameworks similarly do not expose loops as a language primitive — control flow lives in the host Python.
To the authors' knowledge, no published HDC system targets the specific configuration that Sutra occupies: a single tensor-op graph folding the whole program — including string-in / string-out I/O and tail-recursive loops with constant memory overhead in recursion depth (§3.3) — over a frozen externally-trained embedding substrate. The combination of (a) one fused tensor-op graph as the compile target, (b) HDC primitives as the operations, (c) a frozen LLM embedding space as the substrate that doubles as the I/O codebook, and (d) tail-recursive loops compiled to soft-halt RNN cells over a fixed-width state vector is what distinguishes Sutra from each of these peers, not any one of those four properties in isolation.
2.3 Differentiable Programming, AOT Compilation, and Knowledge
Compilation
The closest design ancestors are partial-evaluation systems that specialize programs at compile time (the Futamura projections), differentiable programming systems that treat programs as differentiable functions (JAX), AOT compilation of neural networks (TVM, XLA), and knowledge compilation in symbolic AI (Darwiche & Marquis 2002). Sutra differs from each: TVM/XLA start from a network, not toward one; JAX treats programs as differentiable but does not bake source literals into weights; partial evaluation specializes for compile-time-known values but does not target a neural-network-shaped artifact; knowledge compilation targets Boolean circuits, not continuous embedding spaces. Sutra's combination — fold source literals into the weight structure, compile control flow to RNN cells, run the whole program as one tensor-op graph over a continuous substrate — is the novel position.
3. Consolidation into Canonical Primitives
The central design move: hold the operation interface fixed
(bind, unbind, bundle, similarity, rotate) and find a
binding implementation that works on natural anisotropic embedding
spaces. Standard VSA's Hadamard product fails because correlated
embeddings produce destructive crosstalk under elementwise
multiply. Rotation binding succeeds: each role gets a Haar-random
orthogonal matrix, seeded by a hash of the role-vector content,
and bind(filler, role) = R_role @ filler. Unbind is the matrix
transpose. The rotation acts as a near-orthogonal scrambling that
is invertible by construction.
The compiler emits role rotations as cached matrices, pre-warmed at module init from the codebook so the runtime never pays the QR-construction cost on the hot path. Binding becomes a single matmul against a precomputed matrix — the GPU-friendly shape that fuses with surrounding tensor ops.
The role of the LLM substrate in Sutra is to provide a
deterministic I/O mapping: a string in the source program embeds
to a specific 768-d vector via the configured frozen LLM, and at
runtime the inverse nearest_string lookup decodes any vector
back to the closest known label. The substrate is what makes
program input and output expressible as ordinary strings while
the runtime computes in vector space. Sutra does not depend on
any particular semantic property of the embedding beyond the
mapping being stable and the dimensionality being fixed; the
binding, bundling, and similarity primitives operate on the
vectors as opaque dense tensors and are correct under any
substrate that ships the same dimensionality.
3.1 Capacity of rotation binding on a 768-d substrate
Direct measurement of decode accuracy as a function of bundle
width k, on a 200-filler codebook in the same 768-d substrate the
runtime uses (Haar-random orthogonal R_role, 10 trials per k,
all-random fillers — capacity is a property of the rotation
algebra, not the filler distribution):
| k (bundle width) | accuracy | signal cos | noise cos | SNR |
|---|---|---|---|---|
| 2 | 100.0% | +0.7087 | −0.0022 | 322 |
| 4 | 100.0% | +0.5046 | −0.0025 | 199 |
| 8 | 100.0% | +0.3535 | +0.0029 | 120 |
| 12 | 100.0% | +0.2886 | −0.0007 | 438 |
| 16 | 100.0% | +0.2530 | +0.0011 | 222 |
| 24 | 99.6% | +0.2052 | −0.0006 | 360 |
| 32 | 97.2% | +0.1746 | −0.0002 | 974 |
| 48 | 88.3% | +0.1444 | −0.0003 | 431 |
| 64 | 75.0% | +0.1245 | −0.0002 | 633 |
| 96 | 53.9% | +0.1018 | −0.0000 | 3506 |
| 128 | 39.5% | +0.0891 | −0.0002 | 500 |
Reversibility round-trip: mean ‖unbind(R, bind(R, x)) − x‖ = 1.5 × 10⁻¹⁵ across the same trials, i.e. floating-point round-off. Haar-random Q is orthogonal so Qᵀ Q = I; reversibility is exact modulo numerical error.
Interpretation. The signal cosine decays as ≈ 1/k (consistent with the standard bundled-k retrieval analysis); the noise cosine stays at ≈ 1/√d ≈ 0.036 for d = 768. Their crossing predicts cleanup-failure around k ≈ √d ≈ 28, which matches the observed accuracy knee between k = 32 (97.2%) and k = 48 (88.3%). For practical Sutra programs, the bundle width is typically below this knee — role-filler records have on the order of 1–10 fields, not 100 — so binding-capacity cleanup loss is not the limiting factor in the demonstration corpus. The capacity ceiling is substrate-dimensional, and the language scales with d.
The experiment is experiments/rotation_binding_capacity.py; the
table above is its actual output, not asserted ranges.
3.2 The extended-state-vector layout
Every value in a Sutra program is a vector with a fixed extended
layout: [semantic | synthetic]. The semantic block holds the
LLM embedding for vector-shaped values; the synthetic block
reserves canonical axes for primitive types and slot machinery:
| Index | Purpose |
|---|---|
synthetic[0] |
AXIS_REAL (real component for int/float/complex) |
synthetic[1] |
AXIS_IMAG (imaginary component for complex) |
synthetic[2] |
AXIS_TRUTH (fuzzy truth scalar, used by bool/comparisons) |
synthetic[3] |
AXIS_CHAR_FLAG (marks char primitives) |
synthetic[4] |
AXIS_LOOP_DONE (substrate-side completion flag) |
synthetic[5..] |
SLOT_BASE — disjoint 2D Givens slots for variable storage |
The uniformity is load-bearing: every value has the same shape, so every operation is one tensor op, and the compiler can treat the whole program as a dataflow graph of tensor operations. There is no type dispatch at the leaves.
3.3 First-class loops as RNN cells
Runtime data-dependent loops compile to fixed-T soft-halt cells.
Each tick: snapshot pre-step state, evaluate the halt condition
on the substrate (truth-axis read → heaviside step → cumulative
saturating sum), run the body which uses pass values (or
equivalently return NAME(args) tail recursion) to update state
locals, then a soft-mux freezes state at the pre-step value once
halt saturates. T is a configurable compile-time parameter (default 50);
the soft-halt gating ensures convergence typically occurs in
far fewer steps, with remaining iterations gated to identity
by the saturated halt signal. Optional torch.compile wrapping
unrolls the iteration at trace time.
(The recurrent computational substrate that emerges from this construction is the same shape Siegelmann & Sontag (1992) analyzed when they showed recurrent neural networks with rational weights can compute any Turing-machine-computable function. We mention this for completeness — the result is well-established and assumed for any general-purpose programming language; we do not lean on it as a contribution.)
Each loop returns a halt-cum scalar in [0, 1] indicating
completion confidence. A _program_halt accumulator multiplies
into every loop call's halt-cum and into every function's return
value: a loop that fails to converge wipes program output to
near-zero, providing substrate-pure detection of unconverged
computation.
Constant memory in recursion depth. The state vector that
the loop body updates is fixed-width: [semantic | synthetic],
total dimensionality set at compile time and unchanged from the
first iteration to the T-th. A tail-recursive loop in Sutra
therefore consumes O(1) memory in its recursion depth — there is
no per-step stack frame, no growing context, no heap allocation
keyed by depth. The compiler's emitted artifact for a loop is a
sequence of T identical tensor-op cell evaluations against the
same state tensor, with the soft-halt mask determining which
cells contribute. Doubling T doubles the static graph size but
does not change runtime memory; halving T does the opposite.
Compared with sequence models that accumulate a context window
linearly with input length and with stack-based recursive
languages whose memory footprint grows with call depth, Sutra's
recurrent-tail-recursive form folds an arbitrary execution
trajectory into a single fixed-width vector via VSA superposition
and pays no memory cost as the trajectory deepens.
This is the property that makes Sutra a candidate for substrate-bounded computation: a program written in Sutra can specify a deeper recurrence at compile time without expanding the runtime memory budget, and the upper bound on what fits in T iterations is determined by the binding capacity of the substrate (§3.1) rather than by available RAM. To the authors' knowledge, no other HDC system or HDC compiler exposes user-program-level recursion at all (HDCC compiles classification pipelines only, with no general control flow; TorchHD requires the user to write Python loops over hypervectors, which are not constant-memory in either depth or context).
3.4 Embedded codebook store
The compile-time codebook is stored in an embedded vector
database (internally called SutraDB) that ships as part of the
compiler — analogous to SQLite being embedded in an application
rather than run as a separate service. It holds the (embedding,
label) pairs that arise from basis_vector("...") and
embed("...") calls in the source. The data model is RDF
triples with f32-vector literals as the object position, indexed
by a built-in HNSW index for nearest-neighbor decode. The
on-disk format is a .sdb file that travels alongside the
compiled Python module. There is no external service, no
separate install, and no network dependency.
Every embedded string in a Sutra program is inserted into the
compile-time .sdb codebook, with the embedding as the object
of a triple typed <http://sutra.dev/f32vec>. The runtime decode
operation _VSA.nearest_string(query) is the inverse of embed:
given any vector, return the nearest-string label from the
substrate-resident codebook. Strings declared but unused in
expressions are still inserted, so they remain decodable. The
compiled module's Python data section never carries the
embeddings — they live in the .sdb file, which is an artifact
of compilation, not a service the runtime contacts.
3.5 Project manifest (atman.toml)
A Sutra project is described by an atman.toml manifest at the
project root. The manifest declares the entry source file, the
embedding substrate (provider, model, dimensionality, and whether
to mean-center), and compile-time settings. A minimal example:
[project]
name = "sutra-examples"
entry = "hello_world.su"
substrate = "silicon"
[project.embedding]
provider = "ollama"
model = "nomic-embed-text"
dim = 768
mean_center = true
[project.compile]
loop_max_iterations = 50The compiler reads [project.embedding] to know which LLM to
query for embed("...") and basis_vector("...") calls at
compile time and to fix the dimensionality of the runtime
tensor-op graph. Changing the substrate (e.g. swapping
nomic-embed-text for a different 768-d model, or for a 1536-d
model with a corresponding dim update) re-runs the embed step
at compile time and produces a different .sdb codebook; the
source code does not change. [project.compile] loop_max_iterations
sets the soft-halt loop unroll depth T discussed in §1.2 and
§3.3; the default is 50 and programs requiring deeper recursion
raise it. The manifest format is intentionally narrow — it covers
what the compiler needs to deterministically produce a .sdb
and emit a PyTorch module, and nothing else.
4. The Sutra Compiler
The compiler is a five-stage pipeline:
- Lex + parse —
.susource → AST. - Inline + simplify — stdlib operator definitions inlined; an egglog-based simplifier folds equivalent expressions and runs common-subexpression elimination over the algebra.
- Codegen — AST → Python source emitting PyTorch tensor ops.
The emitted module includes the runtime class (
_TorchVSA) as inline source so the artifact is self-contained. - Compile-time substrate population — embed_batch fetches
embeddings for every string literal;
populate_sutradbpushes the codebook into SutraDB;prewarm_rotation_cacheprecomputes role rotations. - Execute — emitted module loaded; chosen device (CUDA or
CPU) initialized at module import;
main()called; result returned.
The runtime class is emitted inline rather than imported because
the emitted module is the substrate-pure tensor-op graph; the
compile-time decisions (extended-state-vector dimensions, codebook
contents, role rotations, SutraDB path, optional torch.compile)
are all baked into the emitted source. Re-running a compiled
module hits the disk-cached embeddings and the precomputed
rotations on second-and-later runs.
4.1 Substrate-purity invariants
Three invariants the compiler enforces:
- Every primitive runs on the substrate. Numpy is allowed only at compile time (codebook construction, role-rotation pre-warm, SutraDB ingestion) and in monitoring/decoding (cosine for debugging output). Numpy on the runtime hot path is forbidden.
- No scalar extraction inside an operation. Operations may not pull a Python float out of a substrate vector, do scalar arithmetic on it, and pack the result back. Historical bug fixed: complex multiplication had been implemented with scalar extraction; correct implementation is three cached matrices and two tensor multiplies.
- No Python control flow inside an operation.
if,for,whileon scalar predicates break uniformity. Loop halt uses substrate primitives (heaviside,saturate_unit) instead of Python ternaries.
4.2 Compile-time resolution to tensor normal form
Two compile-time mechanisms are central to how the compiler achieves tensor normal form:
- Precomputed rotation matrices. Every role rotation is
constructed at compile time (
prewarm_rotation_cache) and stored as a constant tensor. At runtime,bind(role, filler)is a single matmul against a precomputed matrix — the compile-time resolution eliminates the QR construction from the runtime graph entirely. - Fixed-depth loop unroll. Tail-recursive loops compile to a
fixed-T iteration over the RNN cell body. The compiler fixes T
at compile time (configurable, default 50), and the soft-halt
gating ensures convergence typically occurs in far fewer steps.
With
torch.compile(opt-in viaSUTRA_TORCH_COMPILE=1), the tracer folds the unrolled iteration into a single fused kernel.
Both are instances of the same principle: the compiler resolves structure at compile time so the runtime is a straight-line tensor-op graph. Role rotations become constant matrices; recursion becomes a fixed-depth cell. This is how beta reduction to tensor normal form works in practice.
5. Demonstration Programs
The smoke test (examples/_smoke_test.py) runs 13 demonstration
programs end-to-end against the compiler+runtime pipeline; the
full examples/ directory holds 23 .su files including legacy
syntax tours and feature demos. The 13 smoke-tested programs are:
hello-world, fuzzy branching, role-filler record, classifier,
analogy, knowledge graph, predicate lookup, fuzzy dispatch,
nearest-phrase retrieval, sequence reduction, loop rotation,
concept search, and counter loop. Each exercises a different part
of the language; the subsections below describe four canonical
examples in detail.
5.1 Hello world
function vector main() {
return embed("hello world");
}Compiles to a single-call program that returns the
nomic-embed-text embedding of the literal string. The compile-
time disk cache makes second-run cost approximately zero.
5.2 Fuzzy dispatch
A program that compares an input string's embedding against several prototype embeddings via similarity, then routes through a soft-mux on the resulting truth-axis scores. All arithmetic is substrate-pure; the dispatch is differentiable end-to-end (every intermediate is a tensor on the substrate).
5.3 Role-filler record
A bundled role-filler structure (agent: "cat", action: "sit")
that supports unbind-snap retrieval. Demonstrates that the VSA
algebra works as a structured-data primitive in the language:
construction, retrieval, and multi-hop composition (extract a
filler from one structure, insert it into another, retrieve from
the second) all return correct results.
5.4 Loop demonstrations
The loop demos confirm substrate-pure recurrent computation:
do_while addNumber(x < 11, int x) { return addNumber(x + 1); }starting fromx = 9returns11after the soft-halt cell runs to convergence.- An
iterative_loopwith count = 1000 andT = 50does not converge: the local computation runs but_program_halt ≈ 0, so the function'sreturn total * _program_haltwipes program output to zero, signaling "this didn't finish" via a substrate-side mechanism rather than a host-side exception.
6. Limitations and Future Work
6.1 Object encapsulation as load-bearing
Sutra's design includes ontology-oriented objects (closer to OWL classes than to OOP) for compile-time semantic checking. Today's compiler implements free functions cleanly; object methods parse but their encapsulation rules (no closure across class boundary) are not enforced. Implementing the encapsulation pass and the class-boundary closure check is straightforward future work.
6.2 Codebook integration depth
The embedded codebook store covers the compile-time embed →
runtime decode path today. Extended features (hashmap routing,
persistent codebook across runs via SUTRA_DB_PATH) are
deferred until there is a concrete requirement beyond the
current demonstration corpus.
6.4 Numpy backend retirement
The compiler has historically had two backends; the numpy one
(codegen.py) is deprecated. Behavior tests run on PyTorch; the
numpy backend is retained only for emit-shape tests and gets
fully removed in a follow-up.
7. Conclusion
Sutra demonstrates that a programming language whose compile target is a single tensor-op graph over a frozen embedding substrate is a tractable design — not a research thought experiment but a working compiler with running demonstration programs. The design choice that makes it tractable is uniform shape: every value is the same vector layout, every operation is one tensor op, the compiler treats the whole program as a dataflow graph with no type dispatch at the leaves.
The substrate-purity story is what makes the language useful for the empirical question we built it to address: which embedding operations actually compose, at what capacity, on which substrates. With the language in hand, those questions become programs to write rather than scripts to glue together.
References
- Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. NeurIPS.
- Darwiche, A., & Marquis, P. (2002). A knowledge compilation map. JAIR 17:229–264.
- Gayler, R. W. (2003). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience. Joint International Conference on Cognitive Science.
- Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation 1(2):139–159.
- Kleene, S. C. (1952). Introduction to Metamathematics. North- Holland. The strong three-valued logic system used as the ground for Sutra's polynomial fuzzy connectives (§1.2-1).
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. ICLR Workshop.
- Badreddine, S., Garcez, A. d., Serafini, L., & Spranger, M. (2022). Logic Tensor Networks. Artificial Intelligence 303.
- Heddes, M., Nunes, I., Vergés, P., Kleyko, D., Abraham, D., Givargis, T., Nicolau, A., & Veidenbaum, A. (2023). Torchhd: An open source python library to support research on hyperdimensional computing and vector symbolic architectures. Journal of Machine Learning Research 24(255):1–10.
- Li, Z., Huang, J., & Naik, M. (2023). Scallop: A Language for Neurosymbolic Programming. Proceedings of the ACM on Programming Languages 7(PLDI):1463–1487. arXiv:2304.04812.
- Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., & De Raedt, L. (2018). DeepProbLog: Neural Probabilistic Logic Programming. NeurIPS.
- Serafini, L. & Garcez, A. d. (2016). Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge. NeSy Workshop.
- Vergés, P., Heddes, M., Nunes, I., Givargis, T., & Nicolau, A. (2023). HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing. arXiv:2304.12398.
- Yang, Z., Ishay, A., & Lee, J. (2020). NeurASP: Embracing Neural Networks into Answer Set Programming. IJCAI.
- Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural Networks 6(3):623–641.
- Siegelmann, H. T. & Sontag, E. D. (1992). On the computational power of neural nets. COLT '92. Establishes that recurrent neural networks with rational weights are Turing-complete; the result Sutra inherits via tail-recursive loops over a fixed-width state vector.
- Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence 46(1–2):159–216.
- Sun, Z., Deng, Z. H., Nie, J. Y., & Tang, J. (2019). RotatE: Knowledge graph embedding by relational rotation in complex space. ICLR.
- Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. AAAI.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
---
name: sutra-language
description: Reproduce the demonstration programs and substrate-purity claims for "Sutra: A Programming Language for Vector-Symbolic Computation in Frozen Embedding Spaces" — the working Sutra compiler + PyTorch tensor-op runtime, 13 demonstration programs in a smoke test (with 23 .su files in examples/ total), loop function decls + soft-halt RNN cells, embedded SutraDB codebook with nearest_string decode, opt-in torch.compile wrapping.
allowed-tools: Bash(python *), Bash(pip *), Bash(cd *), Bash(cargo *)
---
# Sutra: A Programming Language for Vector-Symbolic Computation in Frozen Embedding Spaces
**Author: Emma Leonhart**
This skill reproduces the demonstration programs and verifiable
substrate-purity claims of the paper. The paper takes the
algebraic structure of frozen embedding spaces as established by
the prior knowledge-graph-embedding literature (TransE, RotatE,
the word-analogy line) and presents the algorithms and language
that consolidate that structure into composable primitives.
Learned-matrix binding is positioned as next-implementation, not
a finished result; nothing to reproduce there yet.
## What this reproduces
1. **Working compiler end-to-end.** `.su` source → parse → simplify
→ codegen (PyTorch) → execute. Three demonstration programs
(`hello_world.su`, `fuzzy_dispatch.su`, `role_filler_record.su`)
plus loop demonstrations all run with expected outputs correct.
2. **Substrate-pure operations.** Bind (rotation), unbind, bundle,
similarity, arithmetic on canonical synthetic axes, soft-halt
RNN cells — all execute as tensor operations on the substrate.
3. **First-class loop functions with halt propagation.** Four
loop kinds (`do_while`, `while_loop`, `iterative_loop`,
`foreach_loop`); `pass values` and `return NAME(args)` tail-
call surfaces both supported. Convergent loops return correct
values; non-convergent loops wipe program output to ~0.
4. **Embedded SutraDB codebook.** Every embedded string in a
compiled program is in a `.sdb` file at module init. The
decode operation `_VSA.nearest_string(query)` returns the
nearest string label for any vector. Round-trips correctly
including unicode labels.
5. **Opt-in torch.compile wrapping.** With
`SUTRA_TORCH_COMPILE=1`, every loop function is wrapped with
`torch.compile(backend='eager')` so Dynamo unrolls the
per-tick loop at trace time. Programs still produce correct
results.
## Prerequisites
```bash
pip install torch
# Ollama running locally with nomic-embed-text model installed:
ollama pull nomic-embed-text
# SutraDB FFI shared library:
cd sutraDB && cargo build --release -p sutra-ffi
```
The runtime uses PyTorch (CPU or CUDA) for tensor ops, Ollama for
embedding fetches via `nomic-embed-text` (768-dim), and the
SutraDB FFI for the embedded codebook. Without the FFI build the
codebook decode path returns `None` gracefully; the rest of the
language still works.
## Reproducing each result
All commands run from the repo root. The compiler entry point is
the `sutra_compiler` Python module under `sdk/sutra-compiler/`.
### Working compiler (test suite)
```bash
cd sdk/sutra-compiler
python -m pytest tests/ -q --ignore=tests/test_simplify_egglog.py
```
Expected: **244+ tests pass**. The egglog test is skipped because
its import takes >20 minutes on Windows; the test itself is fine.
### Demonstration programs
```bash
cd sdk/sutra-compiler
PYTHONPATH=. python -m sutra_compiler --run ../../examples/hello_world.su
PYTHONPATH=. python -m sutra_compiler --run ../../examples/fuzzy_dispatch.su
PYTHONPATH=. python -m sutra_compiler --run ../../examples/role_filler_record.su
```
Each program prints its result. The hello-world program emits the
nomic-embed-text embedding of "hello world"; fuzzy_dispatch routes
through soft-mux scoring; role_filler_record demonstrates VSA
algebra with bind/bundle/unbind round-trips.
### Loop demonstrations (function-decl form)
```bash
cd sdk/sutra-compiler
python -m pytest tests/test_loop_function_decl.py -q
```
Expected: **23 tests pass** covering all four loop kinds plus the
`pass`-vs-`return NAME(args)` tail-call equivalence and program-
level halt propagation (a non-convergent `iterative_loop` returns
~0 because the unconverged halt-cum wipes the output).
### Embedded SutraDB codebook
```bash
cd sdk/sutra-compiler
python -m pytest tests/test_sutradb_embedded.py -q
```
Expected: **7 tests pass** covering FFI roundtrip, three-orthogonal-
vector nearest neighbor, top-k, unicode label round-trip, env-var
path override.
If the FFI DLL isn't built, all 7 tests skip; the test runner
prints a hint pointing at the cargo build command.
### Substrate-purity verification (host-language scaffolding)
```bash
cd sdk/sutra-compiler
python -c "from sutra_compiler.codegen_pytorch import PyTorchCodegen; from sutra_compiler import ast_nodes; cg = PyTorchCodegen(); cg._prefetch_strings = []; py = cg.translate(ast_nodes.Module(items=[], span=None)); print('saturate_unit' in py, 'heaviside' in py, 'truth_axis' in py)"
```
Expected: `True True True` — the substrate-pure scalar primitives
are emitted in every module.
### Optional: torch.compile wrapping
```bash
cd sdk/sutra-compiler
SUTRA_TORCH_COMPILE=1 python -m pytest tests/test_torch_compile_wrap.py -q
```
Expected: **3 tests pass**. Backend defaults to `eager`; override
with `SUTRA_TORCH_COMPILE_BACKEND=inductor` for fused CUDA kernels
(requires Triton install).
## What this does NOT reproduce
- **The algebraic-structure premise.** The paper takes as given
that frozen embedding spaces have algebraic structure; that is
established by the prior knowledge-graph-embedding literature
(TransE, RotatE, word-analogy work) and is not re-derived here.
- **Object encapsulation as load-bearing.** Parser handles object
decls; encapsulation is not enforced. Queued.
## Repository layout
- `sdk/sutra-compiler/` — the compiler + runtime + tests
- `examples/` — `.su` demonstration programs
- `planning/sutra-spec/` — language specification
- `planning/findings/` — dated experimental findings
- `sutraDB/` — sibling RDF + HNSW triplestore (Rust)
- `paper/` — this paper + skill + reproduction docs
- `DEVLOG.md` — full project history
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.