Most attacks on hash algorithms are concerned with finding 2 messages with the same hash code. Previously we listed the criteria for a cryptographically strong hash, and breaking these would be the main objectives of an attacker.

A brute force attack basically means creating a very large number of messages, and checking the hash of each one until a match is found. This method will obtain a random message with a particular hash.

A related attack is to create a message where some parts of the message are determined by the attacker, other parts are random. For example, suppose an attacker wanted to introduce a virus into a binary file. If he can do this without altering its hash code, he has a better chance of avoiding detection. He could take the original file, and add his virus code. He would than have to find some area of the file which didn't matter too much - perhaps the area where the copyright message is stored or some other non-critical region. By trying lots of different variations of the data in that part of the file, he might eventually discover one which gave the correct overall hash for the file.

For an ideal hash algorithm, changing anything in the message (even a single bit) should result in an overwhelming probability that the hash code is as different as it could be. This means that attempting to change 2 parts of the message such that they cancel each other out and result in the same hash is equivalent to a blind search of all possible hash codes (because any one attempt will generate a "random" hash code).

The defence against brute force attacks is basically to chose an algorithm with a large enough hash size. It then takes so long to search all the possibilities that the attack is not feasible.

The name is based on the so called Birthday Paradox - the strange statistical fact that if you gather 23 randomly selected people in a room, then the chance that there will be 2 people in the room with the same birthday is approximately 50%. This isn’t really a paradox, it is simply a counter-intuitive mathematical fact.

The reason this seems strange is that if you wanted a better than evens chance of someone in the room having the same birthday as YOU, you would need 183 people in the room. If you are willing to accept any 2 people with the same birthday (whatever day that might be), then you only need 23. It does seem like a very small number, but it is true.

The same effect applies to hash values. With a 128 bit hash, if you want to find a message with a particular hash value h, you will need to search an average of 2^{127} randomly generated messages. If you simply want to find any 2 messages which have the same hash, you only need to search 2^{64} messages - still difficult, but consideraby easier.

How can this be used in an attack? Suppose the aim of the attacker is to get you to sign a fair contract and then produce a fraudulent contract with the same hash, and claim you actually signed that?. We know it is very hard to create a fraudulent contract which matches the hash of the fair one. But what if he starts making random changes to both contracts (adding spaces, newlines, hidden characters etc which don’t affect the meaning of the contracts). He is eventually going to find a fair contract and a fraudulent contract which match. He can then get you to digitally sign the particular version of the fair contract which is vulnerable to attack. After that he can claim you signed the fraudulent contract.

You cannot avoid birthday attacks, they are a mathematical fact of life. It reduces the effective hash length by about 50%. In situations where an attacker might exploit the attack, it is important to choose an algorithm which has a sufficient hash length to withstand attack - 256 bits would be recommended.

Previously we mentioned collision resistance. If an algorithm is not collision resistant, it means that it has algorithmic weaknesses which make this type of attack easier. Provided the algorithm is collision resistant, and has a decent hash length, the Birthday attack is not effective.

A cryptanalysis attack is the result of analysing the algorithm of the hash in an attempt to find some weakness which will allow collisions to be found faster than a brute force attack.

These attacks are highly algorithm specific. Normally any attack which defeats part of the algorithm, or even theoretically might do so, will raise serious suspicions about the algorithm and tend to indicate that it should not be used.

Generally this type of attack is best avoided by using a well respected mainstream algorithm such as SHA or RIPEMD.

Copyright (c) Axlesoft Ltd 2020