Birthday Paradox

The Birthday Problem (Birthday Paradox) tries to find the probability that among a group of n persons, 2 or more of them have the same birthday. For simplicity, it ignores the existance of leap years, and the fact that births are seasonal.

Today Lex Fridman posted the following tweet:

@lexfridman: In a room of 23 people, there's a 50% chance that two people have the same birthday.

I tried to calculate it using probabilites, but my brain usually find it hard to do so. Of course, I could check the Wikipedia page and cheat the calcuations from it, but I prefer to use my favourite tool instead, the mighty Monte Carlo methods.

Fuck mathematics! Fuck probabilites! We have got cheap CPU's to do these stuff for us. I'd basically simulate the problem and calculate the probabilites myself.

We get 23 people, assign each one of them a random birthday, and check if two of them or more have the same birthday. And since we have our computers at our disposal, we can repeat this process million times, who cares!

It took me a couple of minutes prove Fridman's tweet, and to be precise, in a room of 23 people, there's a 50.7% chance that two people have the same birthday.

Here is the code:

  1 import numpy as np
  2 import pandas as pd
  3
  4 collisions = []
  5
  6 for i in range(1_000_000):
  7         birthdays = np.random.choice(range(1, 366), 23)
  8         collision = len(birthdays) > len(set(birthdays))
  9         collisions.append(collision)
 10
 11 print(
 12         '% of collisions: {:.1%}'.format(
 13                 pd.Series(collisions).mean()
 14         )
 15 )

Can you parallelize it?