What is a hash?Category: primers
A 4 Minute Read
15 Jan 2017
Image by Psyomjesus
If you’ve ever read anything about Bitcoin, passwords, or verifying software, you’ll have come across the word ‘hash’. Hashes are essentially just fingerprints of a given input, whether that input is a file or a string of text. The most important thing to understand about hashing is that if you make any change to the input, regardless of how minor it is, the hash (fingerprint) will change completely. Conversely, if you hash the exact same input multiple times, you will always get the same hash (output). This is extremely useful. For example, if you want to send someone a file and make sure they receive it without it being tampered with, you can make a hash of that file, send the hash to them through another channel, and then they can hash the file they receive and compare it to the hash you sent. If the hash is different, then it isn’t the same file you sent them and someone might be up to something rather nefarious.
If there is one thing to take home after reading this, it is what I just described, but there’s a lot more to hashes than just that. Another important property of hashes is that if someone gives you a hash, it should be impossible for you to figure out what the original input was without simply making a bunch of random guesses. In fact, you shouldn’t be able to tell whether a given hash is the result of hashing a movie or a password. This is why hashes are considered a form of one-way encryption, meaning that you can find the output of a given input very efficiently, but it is extremely hard to find the input of a given output.
This presents a second useful function for hashes, which is to protect passwords (the first useful function being to verify files). For instance, good software developers never want to actually know what your password is, especially so that if they get hacked the hackers don’t know what the passwords are either, just in case they want to try them on other sites you might be registered to using the same password. So what a website will do, for example, is put your password through a hash before storing it. This means that the website can still verify that you have the correct password (an incorrect password would create a different hash), without ever seeing the original password or having any clue as to what it is.
In order to accomplish this, however, hashes must also be a fixed length. In other words, the hash of a string of ten words must be the same length as the hash of a string of sixty words or of a 45gb movie file. For example, hashing the words “The Tin Hat” using the MD5 hash algorithm results in a hash of dad773314839ce751caff08af311442e, while hashing one of my 12gb zip files results in a hash of fe1cefe33df394fc565214cb96e6228f. Of course, some hashes algorithms are longer than others. MD5 is a rather short hash algorithm compared to the SHA512 hash algorithm (the SHA512 hash of “The Tin Hat” is 128 characters long compared to MD5’s 32 characters).
The benefit of using the longer hash is that it produces a larger fingerprint to verify, as it is sometimes possible for two different inputs to create the same outputs. This is known as a collision, and a longer hash is less likely to produce a collision than a shorter hash. With that said, there’s little practical need to worry about collisions for most modern hash algorithms (MD5 is getting long in the tooth, however); in fact, you’re probably more likely to get struck by lightning when creating a hash than for that hash to produce a collision.
Nevertheless, all hashes can be cracked, theoretically at least. But, assuming the hash algorithm you’re using is designed properly, the speed at which you can crack a hash (i.e. figure out what the original input was) is limited to the speed at which your computer can make random guesses. Because of the way that hashing works, for all intents and purposes it is impossible to do this for large files. On the other hand, for small inputs, like passwords, a modern computer can make billions of guesses every second. It is because of this that it is incredibly important to use a long and strong password, and it is also the reason that any decent software developer will do as much as they can to make even easily guessed passwords harder to crack (for example, hashing a user’s password, and then hashing the hash over and over again several hundred thousand times).
- Hashes take an input and provide an output of a fixed length.
- It is easy to determine the output if you only have the input, but it is incredibly hard to determine the input if you only have the output.
- If you can determine the input from the output faster than just making a bunch of random guesses, the hash algorithm is considered weak, if not broken.
- Hashes can be useful for both verifying that files haven’t changed, as well as for protecting passwords. But these are just two common uses for hashes out of many, many others.