Register Log in

How to solve jmbld blind

Three information-theoretic algorithms - and what they reveal about the engine's own 'optimal' solver.

How the engine cheats

Before we look at solving jmbld blind, watch how the engine solves it with full knowledge of the answer, and why that matters for difficulty calibration.

Every jmbld board has a hidden set of five real English words, drawn at random from a curated answers dictionary of 2,309 entries. The engine builds a board by shuffling the 25 letters of those five words into a 5×5 grid. To solve, the player swaps tiles until each row reads as a real word.

The engine ships two oracle solvers; both see the target words and therefore "cheat". Solvers.Star runs A* over the permutation space with a Hamming-distance heuristic and computes the minimum number of swaps. Solvers.Cyclic skips the search by decomposing the required permutation into cycles and breaking each one. Faster, sometimes a few moves longer.

Star's count isn't trivia. The engine uses it to set the diamond / gold / bronze thresholds on every board, so "diamond on board A" and "diamond on board B" mean the same effort across players. Without this oracle, difficulty would drift wildly between seeds.

i
e
d
u
r
a
r
e
v
i
n
c
d
r
d
s
a
p
a
e
o
g
k
y
c

On this board, Star finds an optimal sequence of 19 swaps; Cyclic finishes the same board in 19 swaps. Star's number is our ceiling: no blind solver can do better, by definition.

Why A* with a Hamming heuristic?

The state space is the set of permutations of 25 tiles, so the natural distance metric is the number of out-of-place tiles (Hamming distance). It's cheap to compute and gives A* something useful to steer with.

Hamming distance isn't strictly admissible for this problem, though. A single swap fixes at most two tiles at once, so the true minimum number of swaps from any state is at least ⌈Hamming / 2⌉, not the full Hamming count. The admissible heuristic is ⌈Hamming / 2⌉ (equivalently, the number of cycles in the permutation minus one). Using the larger Hamming value, Star's A* search can overcommit to a path and return a swap count that is provably-optimal under its own inflated heuristic, but not the actual minimum.

That's why you'll occasionally see a blind solver in this article report fewer total moves than Star on the same board. Not magic. Star's "optimum" is, in places, a small overestimate. §8 has a concrete example.

How Cyclic differs

A permutation can always be decomposed into disjoint cycles. Breaking a cycle of length k costs k − 1 swaps, and the total swap count is fixed once the permutation is known. Cyclic computes that permutation in linear time, walks the cycles, and emits swaps. No search, no heuristic. The trade-off: when duplicate letters create ambiguous assignments, it picks greedily and can end up a few swaps above optimal.

From the player's point of view

A real player has none of the engine's advantages. You see 25 shuffled letters and have no idea which five words they came from. Each swap returns two bits of feedback per affected tile: was the letter at the right position in its row, and is that letter in the solution at all?

That second flag turns out to be a freebie. The board's 25 letters are exactly the multiset of the five hidden words, so every letter is "in the solution" trivially. The real signal is the first flag, per cell: green when the letter sits at the right column of its target row, not green otherwise.

With only that signal, the puzzle becomes pure Bayesian inference. For each of the five rows we maintain a set of candidate words. Every swap prunes those sets. The question is: which swap teaches us the most per move?

How big is the hidden state space?

The dictionary has 2,309 valid answer words. An ordered tuple of five distinct words picks one word per row, so the hypothesis space starts at

2,309 × 2,308 × 2,307 × 2,306 × 2,305 ≈ 6.5 × 1016

Sixty-five quadrillion. Before any swap, though, the global multiset constraint slashes that. Most random 5-word tuples don't use the exact letters available on the board. The multiset constraint and per-row pruning together make this tractable in practice.

What does a swap teach us?

A swap exchanges tiles at two positions i and j. The engine then re-checks every cell against the hidden solution and reports, for each of the two changed positions: was the new letter at the right column for its row?

i
e
d
u
r
a
r
e
v
i
n
c
d
r
d
s
a
p
a
e
o
g
k
y
c
Step 0 / 3

Two booleans, one per affected cell, partition the future of the hypothesis space into four buckets: (green, green), (green, ¬green), (¬green, green), (¬green, ¬green). That's the feedback we use to choose swaps wisely.

Algorithm A: Shannon entropy maximization

The textbook information-theoretic approach: pick the swap whose outcome is hardest to predict in advance. Hard-to-predict outcomes carry the most information about which hypothesis is correct.

Concretely: for each of the 300 candidate swaps, we estimate the probability that each affected cell will turn green, combine into the four-bucket distribution above, and compute its Shannon entropy H = −Σ p log p. The swap with maximum entropy is the swap we want.

On this board, Entropy ran for 19 moves. It learns until every row's candidate set collapses to a single word, then hands off to the same A* oracle from §0 to finish the physical swaps. The interesting number is "how few moves spent learning". The rest is mechanical.

The information-gain identity

Mutual information between the belief and the feedback equals the entropy of the prior minus the expected entropy of the posterior:

I(belief; feedback) = H(prior) − E[H(posterior)]

A nice identity collapses this for our four-outcome partition: I = H(outcome). Maximising outcome entropy is therefore equivalent to maximising expected information gain. That's the one line of math that justifies the whole algorithm.

What the belief tracks per row

The belief isn't just a set of candidate words. For each row R we maintain four kinds of evidence, all derived from per-cell feedback flags:

  • Greens (locked[R][C]): letter L is at column C of row R. Hard constraint.
  • Forbidden (forbidden[R][C]): letter L is NOT at column C of row R, because we just observed it there as non-green.
  • Required (required[R][L]): row R's word contains L at least K times, where K is the max number of yellow L's we've ever seen in a single snapshot of row R.
  • Multiset budget: row R's letters together with the other rows' must equal the board's 25-letter multiset.

Each new swap snapshot feeds all four. A candidate word is kept only if it satisfies every constraint that applies to its row.

Algorithm B: candidate-count proxy

Shannon entropy is theoretically clean but computationally finicky. Every score requires a handful of log calls. A cheaper proxy: score swaps by the expected reduction in total candidate count Σ |Cr|. No logs, just arithmetic.

The two should agree most of the time. They're both rewarding swaps that split the hypothesis space evenly. Where they disagree is interesting. Entropy prefers balanced splits even when the absolute split sizes are small. The proxy prefers large absolute eliminations even if the split is lopsided.

On this board, Proxy ran for 19 moves. Compare to Entropy's count above. The proxy is competitive even though it doesn't touch a single log.

Joint-assignment pigeonhole

The per-row prune just described misses one inference: even if row 0 and row 1 each individually look fine against the multiset, the joint constraint can be violated. If 50 words in row 0 and 30 words in row 1 all need an 'X' and the board only has one 'X', at most one of those candidate sets can use it.

The full prune does a depth-first feasibility check: for each candidate w in row R, does there exist any 5-tuple (one word per row, w fixed at row R) whose concatenated letters fit in the board's multiset? Candidates with no feasible joint are dropped. This runs only when the candidate space is small enough that the DFS terminates quickly, late in the run, when it matters most.

Algorithm C: probe, then exploit

Entropy and Proxy ask "what's the most informative swap right now?" Probe asks a different question: "what's the cheapest swap that drives some row toward a single candidate?" Once any row's candidate set is size 1, the multiset constraint often collapses the rest of the rows automatically. At that point we know the answer.

When the algorithm has uniquely determined all five rows, it switches phases. It now knows the target words, so it invokes the same A* oracle from Section 0 (Solvers.Star) to finish the physical swaps in the provable minimum. The probe algorithm earns the right to cheat by first learning what the engine already knew at board generation, and only then borrowing the engine's optimal finisher.

On this board, Probe ran for 21 moves total: probe phase plus oracle finishing combined. With the per-row constraint suite from §4 plus the joint-feasibility pass from §5, all three algorithms now reach a unique-tuple collapse before handing off to Star, so the distinction between Probe and the information-theoretic algorithms has narrowed considerably. The key difference is just how each one drives toward collapse.

The showdown

Three algorithms, one board, one optimal Star baseline. The table below shows how each algorithm fared. The "+N vs. Star" column is how many extra moves beyond the theoretical minimum each took.

Algorithm Moves vs. Star
probe 21 +2
proxy 19 +0
entropy 19 +0
Step 0 / 21
entropy
i
e
d
u
r
a
r
e
v
i
n
c
d
r
d
s
a
p
a
e
o
g
k
y
c
step 0 150 cands
proxy
i
e
d
u
r
a
r
e
v
i
n
c
d
r
d
s
a
p
a
e
o
g
k
y
c
step 0 150 cands
probe
i
e
d
u
r
a
r
e
v
i
n
c
d
r
d
s
a
p
a
e
o
g
k
y
c
step 0 150 cands
05099150 05111621 Candidates remaining Move number ▼ proxy@3 ▼ entropy@3 ▼ probe@5 ◆ proxy done @19 ◆ entropy done @19 ◆ probe done @21 true min = 17 Star = 19
probe proxy entropy

Top-anchored dashed verticals (▼) mark where each algorithm reached uniquely determined - every row's candidate set collapsed to a single word. Bottom-anchored dotted verticals (◆) mark where each algorithm finished - the total move count, including the A* oracle's swap sequence after collapse. The horizontal gap between the two is the cost of physically moving the tiles once you already know the answer. The white solid line is the provable minimum number of swaps for this board, computed by MinSwap (see §11); the gold dashed line is what Star reports. When the gold sits to the right of the white, you're seeing Hamming's inadmissibility - Star paid for extra swaps it didn't need. Any algorithm beating the white line would have to be wrong.

When does each algorithm shine?

All three algorithms learn the hidden words by accumulating constraints. Once a solver has narrowed every row's candidate set to a single word, the same A* oracle from Section 0 takes over and finishes the swap sequence in the provable minimum. The interesting comparison is therefore how fast each algorithm reaches that collapse.

Across the seeds we've sampled, the candidate-count Proxy is often the quickest learner. The Shannon-entropy maximizer is close behind - and a teaching point: a much cheaper score function (no logs) often tracks the rigorous information measure almost exactly.

Probe is the most variable. When the most-likely letter at the most-uncertain column turns out to be wrong, Probe spends extra moves backtracking. When that letter is right, Probe collapses a whole row in one swap and leaves Entropy and Proxy behind.

The honest finding: solving a puzzle is two problems - learning what the answer is, then physically moving tiles to match. Information theory is the right tool for the first; A* is the right tool for the second.

A natural question the showdown raises is whether Star - the A* oracle - is itself optimal. It isn't: plain Hamming overcounts, so A* with that heuristic returns a count that's at least the true minimum, often a swap or two above it. The actual provable minimum is what MinSwap returns: it skips A* entirely and optimises over letter-bijection cycle structure (min_swaps = N − cycles), enumerating every valid duplicate-letter assignment and keeping the one with the most cycles. The white solid line on the §7 chart is now MinSwap's count - Star is always at or to the right of it. §11 walks the derivation.

Appendix: Hamming distance

A one-paragraph definition - referenced in §7 and throughout the post.

For any board state, the Hamming distance to the solution is just "how many tiles are at the wrong column." Below, every red-outlined square is out of place. The total - 22 for this initial board - is exactly the Hamming distance.

o
k
s
l
u
h
n
f
o
y
l
t
t
e
v
v
e
i
r
r
t
i
n
f
n

Appendix: MinSwap, the true-minimum oracle

How the white "true min" line on the §7 chart is computed - without running A* at all.

Skip the search entirely

Minimum swaps to sort a permutation isn't really a search problem. For an N-position bijection, the answer is N − C, where C is the number of disjoint cycles in the permutation. A cycle of length k needs exactly k − 1 swaps to resolve: lift one element out, rotate the rest into place, drop the lifted element where it belongs. Sum across cycles and you have a closed-form answer, no priority queue required.

The duplicate-letter wrinkle

jmbld boards have duplicate letters, so the mapping pos → target_slot isn't unique. The two "E"s in RIVER could go to either E-slot in the target, and the choice changes the cycle structure. A greedy assignment - lock any positions already at one of their valid target slots (1-cycles, the densest cycle structure available), then pair the leftovers in sorted order - is cheap and right when there's at most one duplicate letter, but on dense duplicate distributions the greedy leftover-pairing isn't always the cycle-maximising one. On the canonical §9 seed the greedy version returns 22 swaps for a board Star finishes in 16; A* is allowed to discover bijections the greedy step skipped, so the "oracle" was overshooting.

Solvers.MinSwap closes that gap. For each letter L with k_L instances and f_L already at a same-letter slot, it enumerates every bijection of the k_L − f_L leftover sources against the leftover targets, so the per-letter candidate count is (k_L − f_L)!. The global candidate count is the product across letters; on a real 5x5 board the fixed-point lock keeps that product in the thousands, well under 100 ms. For each candidate, we count cycles via a single linear walk and keep the assignment with the largest cycle count. Same closed-form N − cycles answer; no priority queue.

How the post uses it

JmbldEngine.Solvers.MinSwap is the implementation. JmbldPhx.Blog.SeedSampler.true_min_swaps/1 calls it on every seed the chart renders, and feeds the count into entropy_chart as the white solid vertical line. The result is cached per {seed_id, lang} in BlindSolverCache and pre-computed at boot alongside the algorithm traces, so the chart's "true min" line is free on every step. The gold dashed line on the same chart is what plain Star reports; the horizontal gap between the two is the price of running A* with an inadmissible heuristic, paid in extra swaps - and with MinSwap as the reference, Star's count is guaranteed to sit at or to the right of it.

Appendix: Star vs MinSwap vs Entropy

Three solvers, same hidden words. Step through and watch each one pick its next swap - and the reasoning that drove the pick.

Star runs the A* oracle directly: every candidate swap is scored by f = g + h where h is the Hamming distance to the target. The chosen swap is the one whose resulting f-score is smallest, so the line under Star's board reports the Hamming drop from the upcoming move. MinSwap (§11) skips A* entirely: it picks the duplicate-letter bijection with the most cycles and walks each cycle k → k − 1; the line under MinSwap's board reports both the Hamming drop (for visual symmetry with Star) and how many swaps remain in the cycle decomposition. Entropy doesn't know the target at all; it picks the swap whose four-outcome feedback distribution has the highest Shannon entropy - equivalently, whose expected information gain over the belief is largest. The line under Entropy's board reports those bits, alongside the total number of worlds (sum of candidate counts across rows) the belief still considers possible.

Step 0 / 18
Star 18 total
o
k
s
l
u
h
n
f
o
y
l
t
t
e
v
v
e
i
r
r
t
i
n
f
n
step 0 H = 22
next swap: h 22 → 20 (Δ = -2)
MinSwap 16 total
o
k
s
l
u
h
n
f
o
y
l
t
t
e
v
v
e
i
r
r
t
i
n
f
n
step 0 H = 22
next swap: h 22 → 21 (Δ = -1) · 15 swaps left
Entropy 16 total
o
k
s
l
u
h
n
f
o
y
l
t
t
e
v
v
e
i
r
r
t
i
n
f
n
step 0 H = 22
next swap: 1.80 bits expected info · 177 worlds