How the engine cheats
Before we look at solving jmbld blind, watch how the engine solves it with full knowledge of the answer, and why that matters for difficulty calibration.
Every jmbld board has a hidden set of five real English words, drawn at random from a curated answers dictionary of 2,309 entries. The engine builds a board by shuffling the 25 letters of those five words into a 5×5 grid. To solve, the player swaps tiles until each row reads as a real word.
The engine ships two oracle solvers; both see the target words and
therefore "cheat". Solvers.Star
runs A* over the permutation space with a Hamming-distance
heuristic and computes the minimum
number of swaps. Solvers.Cyclic
skips the search by decomposing the required permutation into
cycles and breaking each one. Faster, sometimes a few moves
longer.
Star's count isn't trivia. The engine uses it to set the diamond / gold / bronze thresholds on every board, so "diamond on board A" and "diamond on board B" mean the same effort across players. Without this oracle, difficulty would drift wildly between seeds.
On this board, Star finds an optimal sequence of 19 swaps; Cyclic finishes the same board in 19 swaps. Star's number is our ceiling: no blind solver can do better, by definition.
Why A* with a Hamming heuristic?
The state space is the set of permutations of 25 tiles, so the natural distance metric is the number of out-of-place tiles (Hamming distance). It's cheap to compute and gives A* something useful to steer with.
Hamming distance isn't strictly admissible for this problem,
though. A single swap fixes at most two
tiles at once, so the true minimum number of swaps from any state
is at least ⌈Hamming / 2⌉, not the full Hamming
count. The admissible heuristic is ⌈Hamming / 2⌉
(equivalently, the number of cycles in the permutation minus
one). Using the larger Hamming value, Star's A* search can
overcommit to a path and return a swap count that is
provably-optimal under its own inflated heuristic, but not the
actual minimum.
That's why you'll occasionally see a blind solver in this article report fewer total moves than Star on the same board. Not magic. Star's "optimum" is, in places, a small overestimate. §8 has a concrete example.
How Cyclic differs
A permutation can always be decomposed into disjoint cycles.
Breaking a cycle of length k
costs k − 1
swaps, and the total swap count is fixed once the permutation is
known. Cyclic computes that permutation in linear time, walks
the cycles, and emits swaps. No search, no heuristic. The
trade-off: when duplicate letters create ambiguous assignments,
it picks greedily and can end up a few swaps above optimal.
From the player's point of view
A real player has none of the engine's advantages. You see 25 shuffled letters and have no idea which five words they came from. Each swap returns two bits of feedback per affected tile: was the letter at the right position in its row, and is that letter in the solution at all?
That second flag turns out to be a freebie. The board's 25 letters are exactly the multiset of the five hidden words, so every letter is "in the solution" trivially. The real signal is the first flag, per cell: green when the letter sits at the right column of its target row, not green otherwise.
With only that signal, the puzzle becomes pure Bayesian inference. For each of the five rows we maintain a set of candidate words. Every swap prunes those sets. The question is: which swap teaches us the most per move?
How big is the hidden state space?
The dictionary has 2,309 valid answer words. An ordered tuple of five distinct words picks one word per row, so the hypothesis space starts at
2,309 × 2,308 × 2,307 × 2,306 × 2,305 ≈ 6.5 × 1016
Sixty-five quadrillion. Before any swap, though, the global multiset constraint slashes that. Most random 5-word tuples don't use the exact letters available on the board. The multiset constraint and per-row pruning together make this tractable in practice.
What does a swap teach us?
A swap exchanges tiles at two positions i
and j. The engine then re-checks every cell against
the hidden solution and reports, for each of the two changed
positions: was the new letter at the right column for its row?
Two booleans, one per affected cell, partition the future of the
hypothesis space into four buckets: (green, green), (green, ¬green), (¬green, green), (¬green, ¬green).
That's the feedback we use to choose swaps wisely.
Algorithm A: Shannon entropy maximization
The textbook information-theoretic approach: pick the swap whose outcome is hardest to predict in advance. Hard-to-predict outcomes carry the most information about which hypothesis is correct.
Concretely: for each of the 300 candidate swaps, we estimate the
probability that each affected cell will turn green, combine into
the four-bucket distribution above, and compute its Shannon
entropy H = −Σ p log p. The swap with maximum entropy
is the swap we want.
On this board, Entropy ran for 19 moves. It learns until every row's candidate set collapses to a single word, then hands off to the same A* oracle from §0 to finish the physical swaps. The interesting number is "how few moves spent learning". The rest is mechanical.
The information-gain identity
Mutual information between the belief and the feedback equals the entropy of the prior minus the expected entropy of the posterior:
I(belief; feedback) = H(prior) − E[H(posterior)]
A nice identity collapses this for our four-outcome partition: I = H(outcome). Maximising outcome entropy is
therefore equivalent to maximising expected information gain.
That's the one line of math that justifies the whole algorithm.
What the belief tracks per row
The belief isn't just a set of candidate words. For each row R we maintain four kinds of evidence, all derived from per-cell feedback flags:
-
Greens (
locked[R][C]): letter L is at column C of row R. Hard constraint. -
Forbidden (
forbidden[R][C]): letter L is NOT at column C of row R, because we just observed it there as non-green. -
Required (
required[R][L]): row R's word contains L at least K times, where K is the max number of yellow L's we've ever seen in a single snapshot of row R. - Multiset budget: row R's letters together with the other rows' must equal the board's 25-letter multiset.
Each new swap snapshot feeds all four. A candidate word is kept only if it satisfies every constraint that applies to its row.
Algorithm B: candidate-count proxy
Shannon entropy is theoretically clean but computationally
finicky. Every score requires a handful of log
calls. A cheaper proxy: score swaps by the expected reduction in
total candidate count Σ |Cr|. No logs, just arithmetic.
The two should agree most of the time. They're both rewarding swaps that split the hypothesis space evenly. Where they disagree is interesting. Entropy prefers balanced splits even when the absolute split sizes are small. The proxy prefers large absolute eliminations even if the split is lopsided.
On this board, Proxy ran for 19 moves.
Compare to Entropy's count above. The proxy is competitive even
though it doesn't touch a single log.
Joint-assignment pigeonhole
The per-row prune just described misses one inference: even if row 0 and row 1 each individually look fine against the multiset, the joint constraint can be violated. If 50 words in row 0 and 30 words in row 1 all need an 'X' and the board only has one 'X', at most one of those candidate sets can use it.
The full prune does a depth-first feasibility check: for each
candidate w
in row R, does there exist any 5-tuple (one word per row, w
fixed at row R) whose concatenated letters fit in the board's
multiset? Candidates with no feasible joint are dropped. This
runs only when the candidate space is small enough that the
DFS terminates quickly, late in the run, when it matters most.
Algorithm C: probe, then exploit
Entropy and Proxy ask "what's the most informative swap right now?" Probe asks a different question: "what's the cheapest swap that drives some row toward a single candidate?" Once any row's candidate set is size 1, the multiset constraint often collapses the rest of the rows automatically. At that point we know the answer.
When the algorithm has uniquely determined all five rows, it
switches phases. It now knows the target words, so it invokes the
same A* oracle from Section 0 (Solvers.Star) to
finish the physical swaps in the provable minimum. The probe
algorithm earns the right to cheat
by first learning what the engine already knew at board
generation, and only then borrowing the engine's optimal
finisher.
On this board, Probe ran for 21 moves total: probe phase plus oracle finishing combined. With the per-row constraint suite from §4 plus the joint-feasibility pass from §5, all three algorithms now reach a unique-tuple collapse before handing off to Star, so the distinction between Probe and the information-theoretic algorithms has narrowed considerably. The key difference is just how each one drives toward collapse.
The showdown
Three algorithms, one board, one optimal Star baseline. The table below shows how each algorithm fared. The "+N vs. Star" column is how many extra moves beyond the theoretical minimum each took.
| Algorithm | Moves | vs. Star |
|---|---|---|
| probe | 21 | +2 |
| proxy | 19 | +0 |
| entropy | 19 | +0 |
Top-anchored dashed verticals (▼) mark where each algorithm
reached uniquely determined
- every row's candidate set collapsed to a single word. Bottom-anchored
dotted verticals (◆) mark where each algorithm finished
- the total move count, including the A* oracle's swap sequence
after collapse. The horizontal gap between the two is the cost
of physically moving the tiles once you already know the answer.
The white solid line is the provable minimum number of swaps for
this board, computed by MinSwap
(see §11); the
gold dashed line is what Star
reports. When the gold sits to the right of the white, you're
seeing Hamming's inadmissibility - Star paid for extra swaps it
didn't need. Any algorithm beating the white line would have to
be wrong.
When does each algorithm shine?
All three algorithms learn the hidden words by accumulating constraints. Once a solver has narrowed every row's candidate set to a single word, the same A* oracle from Section 0 takes over and finishes the swap sequence in the provable minimum. The interesting comparison is therefore how fast each algorithm reaches that collapse.
Across the seeds we've sampled, the candidate-count Proxy is often the quickest learner. The Shannon-entropy maximizer is close behind - and a teaching point: a much cheaper score function (no logs) often tracks the rigorous information measure almost exactly.
Probe is the most variable. When the most-likely letter at the most-uncertain column turns out to be wrong, Probe spends extra moves backtracking. When that letter is right, Probe collapses a whole row in one swap and leaves Entropy and Proxy behind.
The honest finding: solving a puzzle is two problems - learning what the answer is, then physically moving tiles to match. Information theory is the right tool for the first; A* is the right tool for the second.
A natural question the showdown raises is whether Star
- the A* oracle - is itself optimal. It isn't: plain Hamming
overcounts, so A* with that heuristic returns a count that's at least
the true minimum, often a swap or two above it. The actual
provable minimum is what MinSwap
returns: it skips A* entirely and optimises over
letter-bijection cycle structure
(min_swaps = N − cycles), enumerating every valid duplicate-letter assignment and
keeping the one with the most cycles. The white solid line on
the §7 chart is now MinSwap's count - Star is always at or to the right of it. §11
walks the derivation.
Appendix: Hamming distance
A one-paragraph definition - referenced in §7 and throughout the post.
For any board state, the Hamming distance to the solution is just "how many tiles are at the wrong column." Below, every red-outlined square is out of place. The total - 22 for this initial board - is exactly the Hamming distance.
Appendix: MinSwap, the true-minimum oracle
How the white "true min" line on the §7 chart is computed - without running A* at all.
Skip the search entirely
Minimum swaps to sort a permutation isn't really a search
problem. For an N-position bijection, the answer is N − C, where C is the number of disjoint cycles in
the permutation. A cycle of length k
needs exactly k − 1
swaps to resolve: lift one element out, rotate the rest into
place, drop the lifted element where it belongs. Sum across
cycles and you have a closed-form answer, no priority queue
required.
The duplicate-letter wrinkle
jmbld boards have duplicate letters, so the mapping pos → target_slot
isn't unique. The two "E"s in RIVER
could go to either E-slot in the target, and the choice
changes the cycle structure. A greedy assignment - lock any
positions already at one of their valid target slots
(1-cycles, the densest cycle structure available), then pair
the leftovers in sorted order - is cheap and right when there's
at most one duplicate letter, but on dense duplicate
distributions the greedy leftover-pairing isn't always the
cycle-maximising one. On the canonical §9 seed the greedy
version returns 22 swaps for a board Star
finishes in 16; A* is allowed to discover bijections the
greedy step skipped, so the "oracle" was overshooting.
Solvers.MinSwap
closes that gap. For each letter L
with k_L
instances and f_L
already at a same-letter slot, it enumerates every bijection of
the k_L − f_L
leftover sources against the leftover targets, so the per-letter
candidate count is (k_L − f_L)!. The global candidate count is the product across
letters; on a real 5x5 board the fixed-point lock keeps that
product in the thousands, well under 100 ms. For each
candidate, we count cycles via a single linear walk and keep
the assignment with the largest cycle count. Same closed-form N − cycles
answer; no priority queue.
How the post uses it
JmbldEngine.Solvers.MinSwap
is the implementation. JmbldPhx.Blog.SeedSampler.true_min_swaps/1
calls it on every seed the chart renders, and feeds the count
into entropy_chart
as the white solid vertical line. The result is cached per {seed_id, lang}
in BlindSolverCache
and pre-computed at boot alongside the algorithm traces, so
the chart's "true min" line is free on every step. The gold
dashed line on the same chart is what plain Star
reports; the horizontal gap between the two is the price of
running A* with an inadmissible heuristic, paid in extra
swaps - and with MinSwap
as the reference, Star's count is guaranteed to sit at or to the right of
it.
Appendix: Star vs MinSwap vs Entropy
Three solvers, same hidden words. Step through and watch each one pick its next swap - and the reasoning that drove the pick.
Star runs the A* oracle directly: every candidate swap is
scored by f = g + h
where h
is the Hamming distance to the target. The chosen swap is the
one whose resulting
f-score is smallest, so the line under Star's board reports the
Hamming drop from the upcoming move. MinSwap (§11) skips A*
entirely: it picks the duplicate-letter bijection with the most
cycles and walks each cycle k → k − 1; the line
under MinSwap's board reports both the Hamming drop (for
visual symmetry with Star) and how many swaps remain in the
cycle decomposition. Entropy doesn't know the target at all;
it picks the swap whose four-outcome feedback distribution has
the highest Shannon entropy - equivalently, whose expected
information gain over the belief is largest. The line under
Entropy's board reports those bits, alongside the total number
of worlds (sum of candidate counts across rows) the belief
still considers possible.