The Pigeonhole Principle
Say you have n pigeons, and m pigeonholes. Given that n > m, the principle states that there will always be a pigeonhole with more than 1 pigeon in it.
"That's obvious! How does this relate in any way to cryptographic hashes or the hairs on people's heads? Are you employing a clickbait title like those people on Youtube?"
Don't worry, I'm getting there. First, let's generalize the principle:
Given that you have n items, and m containers, and n > m, there will always be a container that contains more than 1 item.
Relevance to hashing functions:
Most hash functions have a fixed output size, and a practically unlimited input size. For example, the cryptographic hash function SHA-256 has an output size of 256 bits, as its name suggests.
There are 2 256 possible permutations of output. However, the possible permutations of input are far larger - about 2 2 64 - an obscenely large number. If we say that n is the number of permutations of input, and m is the number of permutations of output, we can say that at least 1 hash that is generated will have multiple inputs which result in that hash - because of the pigeonhole principle. n > m, so therefore there are multiple permutations of input that result in the same permutation of output.
This occurrence is known as a collision - where multiple inputs lead to a single output in a hashing function.
Oh no! Is this a problem?
Not really. The chances of a collision happening is very low. To make the chance of at least 1 collision happening equal to 1 in 2, you would have to have 5.06 billion hashes stored.
Because most passwords on websites are stored as hashes, sometimes even having a relatively low chance of hash collision is unacceptable. Passwords that are hashed on secure websites are salted - a random sequence of characters is appended to the input string, and then the combination is hashed. Both the hash and the salt are stored. When the user tries to log in, the system adds the hash to the end or start of the password, then hashes it. If the hash stored is exactly the same as the hash generated, then the user is allowed into the website.
While hashing functions that result in a 1-1 match between input and output exist (dubbed perfect hashing functions), many regard them as impractical and employ the use of a salt.
In short, hash collision really isn't a problem if the system is designed correctly.
Et tu, toupee?
We can use the pigeonhole principle to prove that there are no people in London that have a unique number of hairs on their heads.
There are approximately 8.788 million people living in London, and there are 100,000 hairs on average on a human head. As an upper bound, let's take about 10x that value - 1,000,000 hairs. Assign 1,000,000 to n, and 8,788,000 to m, and you can prove that there are at least 2 people in London with the same number of hairs on their heads, as n > m.
Interesting, right?
Note to readers:
Thank you so much for reading my first article! I found this subject really interesting - and I hope I have explored the concept in a thourough and enjoyable manner. Please do upvote the post if you enjoyed reading. If you have any further examples of how the pigeonhole principle can be applied, or you would like to ask a question, feel free to do so in the replies section!