Martin McBride, 2017-04-09

Tags cryptography message authentication code mac hmac

Categories cryptography

HMAC is a hash based MAC algorithm defined in RFC 2104. It can use any hash function (such as MD5, SHA1 etc) which we will call H. HMAC also requires a user supplied secret key, which is a string of bytes of any length.

The hash algorithm H has two important properties which feed into the algorithm. The first is the hash size, L. For example MD5 has a hash size of 128 bits (16 bytes). The second quantity is slightly less obvious - it is the block size B of the [[data formats:cryptography:Iterative hashes]]. In general B is greater than L.

The first stage of the algorithm is to convert the key to be exactly B bytes long. If the key length is less than B bytes, this is done by adding zero bytes to the end of the key, to form K of exactly B bytes.

However, if the key has more than B bytes to start with, first hash it using H. Then pad the hash value with zeros to make K (again, exactly B bytes).

Now create 2 variants of K, by a simple XOR procedure:

The inner key, Ki is formed from K by XORing each byte with 0x36.

The outer key, Ko is formed from K by XORing each byte with 0x5C.

We use the notation H(x) to represent the hash of byte sequence x. We use H(x, y) to represent the hash of the concatenation of byte sequence x followed by y. Then the MAC of message m is:

H(Ko, H(Ki, m))

In other words concatenate the inner key with the message, and calculate the hash. Then concatenate the outer key with the hash value and calculate the hash of that.

This method creates a MAC of length L (the hash size of H). It is possible to create a shorter MAC, if required, by truncating the MAC to t bits. To do this simply use the leftmost t bits and discard the remainder. The HMAC specification recommends that t should not be less than half of L, and in any case should never be less than 80, otherwise the MAC might not be secure.

The initial key can be any byte sequence of any length. Ideally it should be a random sequence, generated by a cryptographically strong random number generator. For the sake of security, it should not be less than L bytes. However, there is probably not a great deal to be gained by making the key larger than L, and certainly there is no point making it larger than B because then it will simply be hashed back down to L bytes.

If you are using password or passphrase, the situation is different because an L character password is much less random. There is an advantage in using larger passwords, even phrases which are larger than B (even though this will be hashed down to L bytes, a longer the pass phrase will create more randomness in the final L bytes). Of course in practical terms a password of 32 characters or more can start to become cumbersome.

As discussed in the general description of [[data formats:cryptography:Messages authentication codes|MACs]], a hash based MAC can be formed by adding a key at the start of a message before calculating the hash. It is necessary to also protect the end of the message with a key, to avoid an attacker being able to add blocks to the end of a message and recalculate the hash incrementally. HMAC does both these things, and its form is believed to be secure.

Of course, one factor which requires explanantion is - where does B fit into this? Why do we go to the trouble of forcing the key length to match B, which is after all just an internal parameter of the chosen hash algorithm?

Essentially it allows for a clever optimisation. Most [hash algorithms|iterhash] contain a compression function with a block size of B bytes. The compression function starts off with an initial value IV. Every time it has processed B bytes, it returns to its initial state, but with a new value in the compression function.

Since Ki is exactly B bytes long, we can precalculate what the compression function will contain after processing it, and create a variant of H which is initialised to this state. That is, Hi (x) is equivalent to H(Ki, x). If we do the same for the outer hash we get a new MAC calculation:

Ho(Hi(m))

Remember that the variant hashes are identical to H, just with a different IV. So we have completely eliminated the cost of hashing the 2 keys, simply by a small modification to the hash algorithm. If you need to MAC a lot of small messages with the same key, this is a significant efficiency.

Of course this optimisation is entirely optional, and in most applications you will simply use the unmodified hash to calculate the MAC normally.

Copyright (c) Axlesoft Ltd 2020