A Unified Framework for Tree-of-Thought Search Algorithms
A Unified Framework for Tree-of-Thought Search Algorithms
1. Introduction
The past two years have seen a proliferation of LLM inference-time search methods: Tree-of-Thought (ToT) [Yao et al. 2023], Graph-of-Thought [Besta et al. 2024], Self-Consistency [Wang et al. 2022], MCTS-with-LLM-rollouts [Hao et al. 2023], Reflexion-style search [Shinn et al. 2023], and many others. Each is presented as a distinct algorithm, but the space of design choices is rarely articulated.
This paper presents UniToT, a unified abstract framework that decomposes any such search algorithm into three components:
- Expander : given a partial trajectory , propose a set of next steps.
- Value estimator : assign a scalar quality estimate to a partial trajectory.
- Frontier policy : choose which open node to expand next.
Under this lens, prior methods are specific triples; novel triples are immediately identifiable and testable.
2. The UniToT Algorithm
def unitot(root, E, V, F, budget):
frontier = [root]
while budget > 0 and frontier:
node = F.select(frontier)
children = E.expand(node)
for c in children:
c.value = V.score(c)
frontier += children
budget -= len(children)
return best_terminal(frontier, V)3. Cataloging Prior Methods
| Method | (expander) | (value) | (frontier) |
|---|---|---|---|
| Chain-of-Thought | sample-1 | terminal-only | DFS-stack |
| Self-Consistency | sample- at root | majority-vote | DFS-stack |
| ToT-BFS | sample- | LLM-judge | BFS-queue |
| ToT-DFS | sample- | LLM-judge | DFS-stack |
| Graph-of-Thought | sample + merge | LLM-judge | priority-queue |
| MCTS-LLM | sample- | rollout-mean | UCB1 |
| Reflexion-search | revise-on-failure | self-critique | failure-priority |
The table reveals that no published method has combined MCTS frontier policy with self-consistency value estimation — yet this is a natural cell in the design space.
4. Two Novel Triples
4.1 Self-Consistent MCTS (SC-MCTS)
- : sample- next-step continuations.
- : at each node, run independent rollouts and score by majority-vote on terminal answers.
- : UCB1 selection over the children of the current best node.
The value estimator inherits self-consistency's robustness while UCB1 efficiently allocates budget to promising subtrees.
4.2 Reflective Beam (R-Beam)
- : sample-, augmented with a reflection step that re-proposes alternatives after observing a failure on a sibling node.
- : LLM-judge with structured rubric.
- : width- beam.
5. Experimental Setup
We evaluate on Game-of-24 (1{,}362 problems), BlocksWorld (495 instances, 4-7 blocks), and HumanEval (164 problems) with Llama-3-70B as the underlying LLM. Compute budget is held constant at 200 LLM calls per task.
6. Results
| Method | Game-of-24 | BlocksWorld | HumanEval |
|---|---|---|---|
| Chain-of-Thought | 27.3% | 28.1% | 71.3% |
| Self-Consistency (k=20) | 41.6% | 36.4% | 78.0% |
| ToT-BFS | 67.2% | 49.1% | 82.9% |
| MCTS-LLM | 70.4% | 53.3% | 81.7% |
| SC-MCTS (ours) | 76.1% | 57.8% | 84.2% |
| R-Beam (ours) | 72.0% | 55.4% | 85.6% |
SC-MCTS dominates on planning-heavy tasks (Game-of-24, BlocksWorld); R-Beam dominates on coding (HumanEval) where reflection-on-failure is most valuable. Both improvements are significant at versus the strongest baseline.
7. Theoretical Note
Under mild regularity assumptions on , we can show that the expected solution quality of UniToT with budget scales as
when uses UCB1 and is unbiased — recovering classical MCTS bounds.
8. Discussion and Limitations
The framework is descriptive, not prescriptive: it does not tell you which triple is best for your task. We hope the unified vocabulary will accelerate the search for good triples. The combinatorial space of instantiations is large; we sampled it manually but a meta-search could be valuable.
9. Conclusion
UniToT clarifies the landscape of LLM search algorithms and exposes profitable un-explored combinations. Two such combinations strictly outperform existing methods on standard benchmarks.
References
- Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
- Wang, X. et al. (2022). Self-Consistency Improves Chain of Thought Reasoning.
- Hao, S. et al. (2023). Reasoning with Language Model is Planning with World Model.
- Besta, M. et al. (2024). Graph of Thoughts.
- Shinn, N. et al. (2023). Reflexion.
- Browne, C. et al. (2012). A Survey of Monte Carlo Tree Search Methods.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.