Should you give a job offer to the candidate you’ve just interviewed or keep searching? Stop for gas or look for a cheaper gas station? Marry the person you’re with or keep dating?
With some details abstracted, these problems share a similar structure. The goal is to pick the best option from a pool whose size is roughly known, where the options are evaluated one at a time and you can’t go back to a previously rejected choice. The general form of his is called the “secretary problem“, after a formulation of the problem where secretary candidates are interviewed in succession. The general solution to the secretary problem is to go through the first 1/e (37%) of candidates, then choose the next one that is higher ranked than all of the previous options.
The “secretary algorithm” picks the best option at least 37% of the time, the last option 37% of the time (in cases where the single best choice came in the early group and set the bar too high), and a pretty good option the remaining times. Can we improve on this?
The secretary algorithm only uses an ordinal ranking of the options: which option is best, second-best, etc. But in all real-life examples, we often have a cardinal measure for each option as well. We can grade interviewees on a rubric or a test, we see the price of gas in each station, we use Putanumonit’s decision matrix guide to rate potential partners .
We’ll use the example of my dating life in New York to look for a better algorithm. This example will use gendered language; although secretaries come in all genders, people I’ve dated have all identified as the same one.
For illustration purposes, here are the retrospective spreadsheet scores for the first 20 women I went on dates with in New York: 4.9, 6.2, 5.8, 8.1, 4.8, 6.5, 5.1, 7.2, 5.2, 4.5, 4.9, 5.5, 5.3, 5.9, 4.8, 6, 5.5, 6.5, 5, 7.8. Reminder: these scores measure only someone’s compatibility with me, which is driven mostly by the kind of weird person I am. This is not an assessment of anyone’s objective value as a unique and beautiful human being.
This chart  suggests a probability distribution of potential partner compatibility, not just an order ranking. The go-to probability distribution on Putanumonit is a normal bell curve, but it seems like a poor fit here – the distribution is clearly skewed with the majority on the left and a single tail of high scorers on the right side. Also, the tails of a normal distribution drop off very quickly, going as . This is bad for the Chinese soccer team and also bad for romance; there must be more than good partners out there.
Let’s try the exponential distribution. It’s single-tailed, easy to work with, and a lot more romantic with a drop-off rate of . For my dates, an exponential distribution fit looks like this.
The formula for the exponential distribution density is simply the decaying exponential function: for x>0. is a scale parameter – when it increases the entire density curve is just squeezed lower and wider to the right. Conveniently, is both the mean and standard deviation of the distribution.
In my data, the mean is 5.8 and the standard deviation is 1, so in our model, the distribution will start from 5.8 – 1 = 4.8, ignoring the single outlier lady who snuck in below that point (I must have been drunk). The distribution density is for x≥4.8.
Here are some consequences of this model:
- , so 70% of the women I meet will score between 4.8-6 on compatibility.
- Only 1 in 25 will score above 8, and only 1 in 67 will score above 9. Romance is hard.
- If I keep meeting women in the 5-6 range, it shouldn’t cause me to update the model a lot. If start regularly meeting women who score below 4.8 or above 7.5, it’s a sign that the model is way off and I should reassess.
If I feel like I have 34 more dates in me, the secretary algorithm would advise me to stop fooling around after the 20th date () and commit to the next one who scores above 8.1 which is the highest so far. The chance that I’ll meet someone like that in the next 33 tries is . Conditional on that happening, I expect that they will score on average . If I don’t meet this amazing woman, I will have to settle for the 34th and final option which will score on average. So if I follow the secretary algorithm, my expected value is 71% * 9.1 + 29% * 5.8 = 8.14.
To come up with a better algorithm given the actual distribution, we can use induction to make the optimal decision at each step.
- If I’m down to my last date, I have to be happy with whatever I get – the average is 5.8.
- If I’m on the penultimate date, I will reject anyone below 5.8 (since I know I can get that on average on my last try) and accept anyone above that. The average of scores above 5.8 is 5.8+1 = 6.8. The chance of finding someone above 5.8 is, conveniently, Thus, my expectation on the penultimate date is .
- Let’s assume we’ve calculated that if I’m down to n dates left, my expected outcome is an. Thus, on the nth-to-last date, I will accept someone above an and reject someone below an. The chance of seeing a score above an is and if I do, their average score is an+1. So my expectation with n+1 dates left is . With this sequence rule, we can start from the last date and calculate an for each n.
Here’s my expectation as a function of the number of dates I have left.
Let’s see how this “exponential secretary” algorithm does compared to the regular one.
With 34 dates left, my expectation is 8.4, up from 8.14. Great success! However, the new algorithm says that if I keep dating unsuccessfully my standards should go down. When I’m down to 24 dates left, I dip below the original secretary algorithm threshold of 8.1. When I get into the single digits, my willingness to settle should increase, well, exponentially .
My standards can also go up! If I improve my OkCupid skills and start going on dates with more compatible partners (7s and 8s instead of 5s and 6s), I should refit the model and get a distribution with a higher mean. This will raise my expectations, as it should.
The biggest difference between the algorithms concerns meeting someone amazing early on in the process. The original secretary algorithm rejects all the early candidates regardless of how good they are. If instead of 34 dates I had 180 dates left, it would tell me to ignore all candidates before the 74th one (1/e = 74/200) no matter how exceptional. The exponential secretary algorithm is more lenient. If I have enough data points to nail down the model and I meet someone who’s a huge positive outlier (like a 10 on my scale) I should go for it regardless of the number of potential candidates remaining. I strongly endorse this conclusion.
In summary, here’s what you can do to improve on the secretary algorithm for secretary problems:
- Obtain a cardinal measure of the candidates, not just a rank order. If no obvious measure presents itself, putanumonit anyway.
- After a few data points have been gathered, plot them to see if they fall in a reasonable distribution that you can model. Err towards distributions with fatter tails, since you don’t want the model to exclude possible points that are outside the initial range of data.
- Once you have a distribution, you have a cumulative probability function: the chance of meeting a candidate who scores above some threshold.
- By induction from the last candidate, you can estimate the expected score you can achieve based on the number of candidates left. The expectation at n candidates left is your threshold at n+1.
- As you gather more data, keep refitting the model, especially if the data points fall far outside the predicted pattern.
- At times you will feel regret and desire to call a secretary you have rejected a few months ago. Don’t. She has moved on and is happier without you.
 When the marriage question popped up on Reddit, a link to my decision matrix post popped up within an hour of a link to the secretary algorithm wiki page. As a reminder, the point of decision matrices is to allow you to think of many values and variables at once, not to replace all of them with a single arbitrary number.
 The R code I used to generate these charts is on my Github.
 One of the cool things about the exponential distribution is that the truncated distribution looks exactly the same, just with a different starting point. So if my exponential distribution has a mean and deviation of theta and I truncate it to only look at points above a cutoff Y, the resulting distribution will have the same deviation and a mean of Y+theta.