Choosing the Right Path

We will address the unique matching problem and its variants — a classic metaphor for decision, information, and bounded rationality.

Although similar concerns appear as early as the 18th century, when Abraham de Moivre discusses probability problems in The Doctrine of Chances De Moivre (1718), he does not directly formulate the matching problem in this manner. In the algorithmic literature, similar structural operations appear later: Quicksort Hoare (1961) solves, through a divide et impera strategy, subproblems that resonate with the logic of pair matching; subsequently, the same idea is used (and adapted) to address the problem known in the literature as nuts and bolts Alon et al. (1994).

The unique matching problem — in its simplest instance represented by the case with 4 doors and 4 keys — provides a minimal framework for analyzing decision-making processes under conditions of incomplete information. Even in this extremely simplified setting, relevant theoretical questions remain: what does it mean, formally and operationally, that an algorithm is “more efficient”? and how can a “smarter” algorithm fail in terms of total cost or the effort perceived by the agent?

This historical trajectory illustrates two important methodological observations. First, conceptual connections between domains — here, between classical probability theory and algorithmic analysis — can offer new perspectives on an apparently simple problem. Second, optimization according to a single criterion (e.g., execution time) does not guarantee optimality in multidimensional decision terms (e.g., cognitive cost, number of experimental comparisons, or robustness to perception errors).

Academic formulation:

Given n doors and n keys, and each key opens exactly one door, we must determine the minimum number of trials required to open all doors.

Note: This formulation focuses on actually opening the doors, not on identifying the exact bijection between keys and doors.

In this article we will compare several solving strategies – from systematic testing to controlled random choice – to show that, regardless of method, there exists a structural limit of efficiency. This limit does not depend on luck, but on the symmetry of information.

The Classical Solution Link to heading

We start with the first door and try keys until we find the right one; then we move to the next door and repeat, eliminating the used key. In the worst case, the algorithm guarantees at most $ \frac{n(n+1)}{2} $ trials.

This method seems trivial and guaranteed, but not necessarily optimal; surely there exist other strategies that also guarantee at most $ \frac{n(n+1)}{2} $ trials.

The problem becomes: how can we prove that no better strategy can be found? That is, one which, at a certain significance level, would require fewer trials to open all doors.

The Principle Used Link to heading

To show that there is no method more optimal than the classical one, we will use Bellman’s Principle of Optimality.

Define V(k) = the optimal number of trials remaining when we have just opened k doors and are about to attempt to open a new door among the n - k remaining. Clearly, $ V(0) $ is maximal, while $ V(n)=0$.

Thus, $ V(0) $ becomes the global solution for solving the problem:


 V(0) = \sum_{t=0}^{n} \min_{a_t \in \Gamma(x_t)} 
  c(x_t, a_t)\tag{1}

Let the initial state be $ x_0 $ and the set of possible actions $ \Gamma(x_0) $. We choose an optimal action $ a_0 \in \Gamma(x_0) $, and the next state is obtained through the transition function: $ x_1 = T(x_0, a_0) $.

The associated value function is:


V(x_0) = \min_{a_0 \in \Gamma(x_0)} 
  c(x_0, a_0) + \mathbb{E}\big[\, V(x_1) \mid x_0 \overset{a_0}{\to} x_1 \,\big]\tag{Bellman}

where we have the constraints,

$ \mathbb{E}[ V(x_1) \mid x_0 \overset{a_0}{\to} x_1] $ represents the expected value of the next state, conditional on $ a_0 \in \Gamma(x_0) $,
$ c(x_0, a_0) $ represents the marginal cost associated with choosing the action $ a_0 $ in state $ x_0 $ — constant and equal to 1 because we want to count the trials. Note, it does not represent the total cost of transition $ x_0 \overset{a_0}{\to} x_1 $,
When we set the marginal cost $ c(x_0,a_0) = 1 $ for every trial, the associated value function $ V(x_0)$ measures the maximum guaranteed number of trials required to open all doors from state $ x_0 $.
If we set the correct average marginal cost per attempt that supports the complete transition $ x_0 \overset{a_0}{\to} x_1$, namely $ c(k<n-1, a_k) = \frac{n-k+1}{2} \cdot \frac{1}{n-k}$; $ c(n-1, a_{n-1}) =1$, the value function would yield the expected number of trials.
Here we choose an initial action, denoted by $ a_0 $, knowing that our choice determines the system state at the next moment, $ x_1 = T(x_0, a_0) $, as well as the transition probability.

If the Markov process induced by fixed action $ a_0 $ has the discrete transition kernel $ p_{a_0}(x’ \mid x_0) $ on the state space $ \mathcal{S} $, then

 \mathbb{E}\big[\, V(x_1) \mid x_0 \overset{a_0}{\to} x_1 \,\big]
= V(x_0)\, p_{a_0}(x_0 \mid x_0) + V(x_1)\, p_{a_0}(x_1 \mid x_0)\tag{2}

At each step (trial):

we make one attempt (fixed cost 1);
with probability $ \dfrac{1}{n-k} $ we succeed → we move to the next state $ V(k+1)$;
with probability $ 1 - \dfrac{1}{n-k}$ we fail → we remain in the same state $ V(k)$.

From these observations follows Bellman’s equation for $ V(k) $:


V(k) = 1 + \frac{1}{n-k} V(k+1) + \left(1 - \frac{1}{n-k}\right) V(k)

Checking applicability Link to heading

To apply Bellman’s Principle of Optimality correctly, the following assumptions must hold:

Sequential structure — the problem can be divided into successive stages
```
x_0 \to x_1 \to \dots \to x_T 
```
each stage involving a decision $ a_t $ that influences the next state.
Memoryless system — only the current state and action matter, not how you got there.
```
P(x_{t+1} \mid x_t, a_t, x_{t-1}, \dots) = P(x_{t+1} \mid x_t, a_t)
```
Additivity of the cost/reward function — the total objective is the sum of stage-wise costs.
Independence of future decisions — for a given state $ x_t$, decisions can be optimized independently of the past, assuming optimal behavior from that point onward:
```
V(x_t) = \min_{a_t \in \Gamma(x_t)} \left[ c(x_t, a_t) + \mathbb{E}\big( V(x_{t+1}) \mid x_t, a_t \big) \right]
```
Well-defined recursive relation — the problem must allow a Bellman-type equation expressing the optimal value in one step as a function of the next-step values.
Dynamic consistency — decisions considered optimal today remain optimal as time evolves and new information becomes available.
Complete state specification — the state variable $ x_t $ contains all relevant information for the optimal decision; if the state is incomplete, the principle no longer holds.

Practical note: before applying the principle, check whether the decision model satisfies these conditions — especially the memoryless property and the additivity of the cost function.

If they hold, the principle guarantees that the global optimum can be obtained through locally optimal decisions. ⚠️ Crucial limitation: Bellman’s principle does not apply to non-additive measures (e.g., percentiles, variance, SCR).

To overcome these limits, we need:

The Actuarial Solution Link to heading

If we focus on the optimal way to open the first door (we try keys until one actually opens a door), after performing this choice, the problem becomes independent for the next stage. This property ensures that we can uniquely sum the random variables corresponding to each stage. Therefore, denoting RV(1:n) the random variable defined on the equiprobable support 1:n, we can determine:

1 + RV(1:2) + RV(1:3) + RV(1:4)⏎ 
════════════════════════════════
Random variable with |7⟩ outcomes, (µ = 7 | σ = 1.47), entropy 1.78
──
ᴿᵢ    4    5    6    7    8    9   10
ᵖᵢ 1/24  1/8 5/24  1/4 5/24  1/8 1/24

For $ n=12$ we have $ 12! $ possible variants and thus an enormous number of permutations, yet:

1+RV(1:2)+RV(1:3)+RV(1:4)+RV(1:5)+RV(1:6)+RV(1:7)+RV(1:8)+RV(1:9)+RV(1:10)+RV(1:11)+RV(1:12)⏎ 
═════════════════════════════════════════════════════════════════════════════════════════════
Random variable with |65⟩ outcomes, (µ = 45 | σ = 7.29), entropy 3.4
──
ᴿᵢ           41           42           43           44           45           46           47           48           49
ᵖᵢ   3088/66297   3463/69999    958/18551   3491/65886 33969/635639   3491/65886    958/18551   3463/69999   3088/66297

ℹ The first 9 significant values are displayed with extremes [13:77], VaR(99.5%) = 63.

Thus, not only can we determine the number of trials required to ensure any door is opened, but we can also evaluate the exact probability of opening all doors over the entire possible spectrum.

Therefore, even with 12 keys/12 doors, it is most likely we open all doors in approximately 45 trials; at a 99.5% confidence level, the number of trials required rises to 63. Guaranteed opening happens only at 77 trials, yet this situation occurs with an extremely small probability of only 1/12! = 1/479,001,600. It is equally likely to open all doors in 12 trials, since the random variable has a symmetric distribution.

For $ n $, the sum is none other than the Mahonian distribution Canfield, Janson, and Zeilberger (2011) shifted by $ +n $, reflecting the progressive accumulation of effort depending on the order in which the keys are matched.

In conclusion,

The Principle of Optimality cannot determine the optimal action to follow — including the probabilities associated with transitions — but it ensures we do not need to find the optimal solution for the entire dynamic process. It is enough to find the optimal decision at each stage because a globally optimal solution consists of locally optimal decisions. The principle does not tell you what to do, but how an optimal solution looks.

References Link to heading

Alon, Noga, Manuel Blum, Amos Fiat, Sampath Kannan, Moni Naor, and Rafail Ostrovsky. 1994. “Matching Nuts and Bolts.” ACM-SIAM Symposium on Discrete Algorithms. https://web.cs.ucla.edu/~rafail/PUBLIC/17.pdf.

Canfield, E. Rodney, Svante Janson, and Doron Zeilberger. 2011. “The Mahonian Probability Distribution on Words Is Asymptotically Normal.” Advances in Applied Mathematics 46 (1): 109–24. https://doi.org/https://doi.org/10.1016/j.aam.2009.10.001.

De Moivre, Abraham. 1718. The Doctrine of Chances. London: W. Pearson.

Hoare, C. A. R. 1961. “Algorithm 64: Quicksort.” Communications of the ACM 4 (7): 321. https://doi.org/10.1145/366622.366644.