Dating: a Research Journal, Part 1.5

Someone arrived at this blog today by Googling ‘china cant play soccer

In utterly coincidental news, this blog is banned in China [edit: apparently now WordPress is again accessible in China].


These detailed schedules of upcoming updates that I add at the bottom of each post? I hope people have learned not to take them seriously. ¯\_(ツ)_/¯

This is part 2 in the dating sequence, except that it’s more like part 1.5 because it’s about profile optimization. Part 1 is about finding your advantage and kicking ass in online dating. I started writing part 3 on date strategy and stopped after I wrote 2,000 words developing concepts in game theory… The point is, these posts happen to me, kind reader, I have no control of the process.


Part 1.5 – Game (the system), Set (your ratings) and Match (percentage)

I’m going to explain how the match percentage on OkCupid works, and how to make it work in your favor. If you’re not on OkCupid, there are two possible outcomes:

  1. You’ll achieve a profound epiphany in the realization that large parts of your life are governed by mysterious integers, and that untold power comes with control of these integers.
  2. You’ll be bored. If that happens to be the case, please email me for a full refund.

A few years ago, Chris McKinlay became famous as “the mathematician who hacked OkCupid“. He went from having a few women with a match percentage above 90% to having several around 99% using a complex machine learning project that involved programming a spyware crawler that imitated the typing speed of his friend in order to illegally collect info on 6 million profiles. In the end, his now-wife found him by searching for 6-foot guys with blue eyes, but that’s neither here nor there.

Here’s the meat of the story, emphasis mine:

He’d already decided he would fill out his answers honestly—he didn’t want to build his future relationship on a foundation of computer-generated lies. But he’d let his computer figure out how much importance to assign each question, using a machine-learning algorithm called adaptive boosting to derive the best weightings.

McKinlay also p. Can we figure out how to get similar results, a big boost to match rating, without all that craziness? Would I be writing if we didn’t?

Love percent 1


 

OkCupid calculates a match percentage for any two people based on their answers to an intrusive, interminable, somewhat addictive and occasionally funny questionnaire. It’s the first piece of info you see about someone along with their face and their age, and it’s used by OkCupid to sort the matches that you are shown before others. I was skeptical of the importance of the number at first but did notice that practically all my best dates were >90% matches, Rachel and I are 95%. 95% is also the threshold at which reply rates start going up significantly:

Match-Reply
Source: blog.okcupid.com

95% and above is pretty rare, at these scores there’s a significant number of people who would be on the fence after seeing your mug but will give you a chance in conversation based on the match percentage. Your two goals for match percentage should be:

  1. Answer enough questions honestly that the match % gives you a good indicator of compatible people.
  2. Have as many people as possible in the high nineties.

The classified algorithm for the match calculation is locked away in… oh wait, the formula is right here!

Let’s call our two potential lovers R and J, which could stand for Rachel and Jacob, Romeo and Juliet, or my favorite couple: Riyad and Jamie. OkCupid calculates the match of R’s answers to J’s preferences (RaJp%) and vice versa (JaRp%) separately. The match percentage is the square root of RaJp% * JaRp% which basically gives the average of the two adjusted slightly downward. The final number shown is the above result minus 1 / (total number of questions answered), you want to answer at least 100 so your penalty isn’t more than 1%.

RaJp% is calculated in the following way: J assigns each question an importance rating that gives it a weight in the calculation: 1 point for “little important“, 10 points for “somewhat important” and 250 for “very important“. Then, J lists the acceptable answers. If R gives one of the acceptable answers, they get full points for the question. If they give an unacceptable answer: 0.

RaJp% = (points in questions that R answered acceptably) / (total points in questions R answered).

The 1/10/250 point system is extremely skewed, with huge differences between the categories. Here’s what the scoring tells us:

  1. Questions that the other person didn’t answer, or that you’ll accept any answer for, don’t matter at all. Ergo: ignore your match percentage with people who answered less than 20 questions.
  2. “A little important” questions practically don’t matter. Ergo: “a little important” = “don’t care about in the least”.
  3. If you have more than a couple of those, the “very important” questions determine the match percentage almost entirely. Ergo: Put “somewhat important” for most of the questions and use that to assess actual compatibility. Put “very important” for absolute hard filters, for a few “gimme” questions where you are certain to get a compatible answer, and never for equivocal questions that could cause confusion.

For example, let’s say that J marked 48 questions as “little important”, 48 questions as “somewhat” and 4 questions as “very”. R gave answers that J accepts to 36/48, 36/48 and 4/4, respectively. This side of their match percentage will be:

RaJp\% = \frac{36 \times 1 + 36 \times 10 + 4 \times 250}{48 \times 1 + 48 \times 10 + 4 \times 250} = \frac{1396}{1528} = 91.36\%

Now J considers adding another question:

  • If J marks it “a little important” a bad answer by R will change the the match from 91.36% to 91.30% and a good answer to 91.37% – barely any difference.
  • If J marks it “somewhat important” a bad answer by R will make the match 90.77% and a good answer will make it at 91.42% – there’s a little downside to an incompatible answer.
  • If J marks it “very important” a bad answer by R will make the match 78.52% and a good answer will make it at 92.58% – this single question now single-handedly determines whether R is a great match or a meh match.

If you mark a lot of questions “very important” then none of them will have an outsize effect, but you’ll also lose any ability to be flexible with the weights since all questions that aren’t “very important” will barely count at all. Instead, you can probably find a few unequivocal questions that almost everyone in your relevant dating pool will answer the same way. For me, it’s things like How many children do you have? Is homosexuality a sin? Are you racist? Do you put more weight in science or faith? and the horrifying Would you sabotage contraceptives to have kids even though your mate doesn’t want kids?

Here’s the hack: you mark enough “obvious” questions like that as “very important” so that every relevant person will be 10/10 on those. You mark the rest of the questions “somewhat important”. If someone answers 100 of your “somewhat important” questions, the entire range from 0/100 acceptable answers to 100/100 will correspond to a range of  match percentages from 71% to 100%. You just take the entire range and skew it way upward, 83/100 on actual compatibility questions will get a one-sded match score of 95.14%! By ensuring a match on all the “very important” questions, you can artificially inflate the match percentages you get (and the ones that the other person sees at the top of their match list) while still seeing enough variance in scores to be able to deduce your actual compatibility.

Let’s say you only want to engage with people who actually match you on 85% of non-obvious questions. If you do the hack, their match to you will show up as 95.7%. Your match to them will still be 85%, and the number that you’ll both see will be \sqrt{95.7\% \times 85\%} = 90.2\% . You know that you should actually focus on the 90+ crowd if you want 85+, but your 92%-ers now start seeing 95% and are more likely to reply to your message.

Is inflating your match percentages sneaky? Yeah, it is. So is adding two inches to your height and a $20,000 to your income, which everyone still does. You can argue that dating is a marketplace and catching up to everyone else’s unfair advantages isn’t cheating. You can argue that a hack is a cheat is a lie. I won’t argue with you, I’m just doing some math.


When answering the questions yourself, the only rule I’ll recommend is: be honest but don’t outsmart yourself. Here’s what I mean by outsmarting yourself:

racist1

This is Ms. W, one the smartest women I’ve ever met. Her response is the smartest, most genuine answer you can give to are you racist. Everyone’s a little bit racist, you can easily measure your own implicit biases about race right here. The only problem is that before it has a chance to impress anyone, Ms. W’s answer will kill her match percentage with all the guys she actually wants to date and increase her match percentage with Donald Trump.

There’s a slightly similar question to the one above, it goes: would you consider dating someone who has vocalized a strong negative bias towards a certain race of people?

Here’s what went through my head when I read that question: Well, what do you mean “consider”? I “consider” a lot of things that I don’t end up doing. Also, is there an expiry date on “expressing a bias”? What if someone was racist in elementary school, would I hold it against them? What if someone really hates Inuits but is otherwise awesome? Also, the question isn’t whether I am a racist, it’s about whether I would date a racist. By indicating the answer I want to see others give, I’m actually deciding whether I would date someone who would date a racist. But that actually means that my own answer can be interpreted as whether I would date someone who would date someone who would date someone…

Here’s what I actually answered: No.

Be smart, be honest, don’t be racist.


Part 3 will not arrive before the middle of next week because I’m attending a CFAR workshop where I shall receive my robe, dagger and sacred tome of Bayesian incantations.

12 thoughts on “Dating: a Research Journal, Part 1.5

  1. I was looking forward to this, it didn’t disappoint. This was very entertaining and very practical advice. The part about Mrs. W had me chuckling.

    On a side note “Hacking” OKC is a whole is a whole lot easier when you’re 6′ tall and blue eyed =P

    Like

  2. Hi there, thanks for writing this helpful article. I’m a bit confused by this wording:

    “You know that you should actually focus on the 90+ crowd if you want 85+, but your 92%-ers now start seeing 95% and are more likely to reply to your message.”

    I understand what you’re saying in the first sentence. Because your match is “artificially” inflated, your 90% match is really an 85% match. But for the second sentence, the square root of .957*.92 comes out to .938 rather than .95. How do the 92%ers start seeing 95%?

    Thank you again!

    Like

    1. 95.7% is your match percentage to people who are actually 85%. Someone’s who’s actually 92% will match you at around 97%, and sqrt(0.97*0.92) is close enough to 95%. In either case, the exact numbers aren’t important, but I do appreciate your nitpicking of my algebra – that’s exactly what I do when I read articles with math in them and what I hope to inspire my readers to do as well.

      Like

      1. Cool, thanks for the reply that makes sense. I was just wondering how to get to .95 because it said that that was a breaking point where response % went way up : ) Thanks again for the great piece!

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s