Methods that protect the information from tampering by illegitimate parties are referred to data integrity protection methods. When information is carried through messages over communications channels, the integrity protection is typically provided by message authenti- cation mechanisms. In order to provide data integrity protection for the message, the sender needs to provide a proof of authenticity for the message. In real life, all the legal documents are signed by the involved parties. The signatures not only provide proof of authenticity since the signature for each person is unique, but also prevent the documents from forgeries.
Initiator (entity A)
Responder (entity B) Authentication request (optional)
(optional TokenID), Cert I, TokenRI (optional TokenID), Cert R, Token IR2
(optional TokenID), TokenIR1
Of course, anybody who has asked a big sister to forge the parent signatures on a report card knows that there are artistic ways to get around the weakness of authentication methods. After that analogy, understanding message authentication in the world of digital communications becomes very simple: the sender of a message can provide proof of authenticity for the message by signing the message with a secret that is unknown to the outside world and adds the signature to the end of the message. However, it should be noted that in contrast to the personal signatures that always look the same and hence can be easily forged, the signatures in the digital world depend on the message content and take a different form (bit string) every time. It is as if your parents and the teacher had a secret agreement that your parent would sign your report card in a different way for every season. This way, if a man-in-the-middle (MITM) tries to change the message contents, without the knowledge of the secret, she cannot reproduce a signature that matches the content of the message. A typed legal document that has crossover and handwritings over it must again be signed by the involved parties, otherwise it has no legal bearings. To produce the digital signature, the sender needs to run the message through an algorithm that takes the secret (key) as a secondary input. However, since running these algorithms over entire messages are computationally expensive, the sender compresses the data using a so-called hash algorithm (H) and arrives at a digest value, which is typically called message authentication code (MAC). The MAC value is added by the sender to the end of the message sent to the receiver and is checked by the receiver.
Hash algorithms are based on hash functions that are mathematical one-way functions, meaning that it is close to impossible (depending on the dispensable amount of time and computing power, of course) to guess the input of the hash function, given the output of the hash function. This is a very important characteristic of the hash function to be used for message authentication. The attacker should not be able to guess the input of the hash function from its output. Now an observant reader might say, when the data is simply sent in the clear, the attacker can easily read the message as well as the MAC. If both message and MAC value are readable, then both input and output of the hash function are exposed and there is no use for the hash function. Well, not quite, the sender and receiver also share a secret that they use as input to the hash function while calculating the hash value (MAC). The hash functions that can accept secret keys are often referred to as keyed (or secure) hash functions. An attacker that does not know the secret cannot tamper with the message data without being exposed, since she cannot re-calculate the hash value based on the altered data. However, secret hash algorithms have been put to test by hackers as well as cryptoanalysts, who attempt to break the existing algorithms as part of their day jobs. Experience and science have shown that in majority of cases increasing the size of key, i.e. the number of bits in the key, tends to make the keyed hash function more resilient to attacks.
Over the years, many hash functions have been developed for message authentication. In the following section, we describe HMAC as a standardized mechanism for providing MACs.
2.1.3.1 HMAC-MD5
As mentioned earlier, the value of a hash function is based on the difficulty with which the input of the hash function can be guessed from its output; so an attacker cannot easily alter the message and re-calculate the hash to present the forged message to the unsuspecting receiver. Many hash functions were developed during the course of several decades of research on cryptography. However, as the processing power of the computer CPUs and
cryptographic analysts grows, many previously developed hash functions are rendered less effective, and the key sizes that were previously deemed long enough seem too short to provide adequate protection against cryptographic attacks. The hash function design can be compared to clothing out of fashion industry: they are both cool when they come out and passé a few years later, with the difference that the hash functions tend to not make any comebacks. That could (only a guess) be due to a more vivid imagination of hackers and cryptographers in their own field even though they typically have an unusual sense of fashion.
With all that digression, it should now be obvious that when providing integrity protection, the need for replacing the less secure hash functions with the new and more secure ones existed frequently. For those reasons, providing a framework that defines the usage of a generic hash function was deemed useful. A framework developed by IETF, called HMAC [HMAC2104], enables the designers and implementers to deploy any generic keyed hash function for a generation of MACs. We provide an overview of the HMAC framework in the following. In order to prove that the message is authentic, the sender compresses the data to be protected using a hash algorithm (H). If the data is too long, it may be divided into blocks of B-octets long and then fed into the hash function, which produces an output value of length L (L=16 bytes by MD5 and L=20 for SHA1).
HMAC=H(K⊗opad, H(K⊗ipad, M))
where, the K is the secret key shared between the sender and the receiver of the message (M). To maintain a minimum strength for the security of the HMAC procedure, the HMAC specification recommends that the key length is at least as large as the length of the output of the hash function (L).
The ipad and opad simply refer to the inner and outer padding applied to the original message, respectively and are simply constant values that have the same length as the input data:
● ipad is the one-byte value of 0×36 (× stands for hexadecimal notation) repeated B times (to make B bytes).
● opad is the value of 0×5C repeated B times.
Now that we have gone through the details of HMAC calculations, for the rest of this book we use the following simplified form to refer to an HMAC calculated over a message M, using a key K:
HMAC(K, M)
The sender simply appends the calculated HMAC value to the message M, before sending the message to the receiver: