Last modified on November 4, 2019, at 03:00

Hashing Algorithm

In computing, a hashing algorithm is a computational process which takes an input of bytes and reduces it to a fixed length base64 string which is unique (within boundaries) to the input. Hashes are mainly used for data integrity, and with further asymmetrical algorithms as digital signatures.

Method

Hashes are computed by churning down a large size byte array until it reaches the target size. If the initial byte array length is not divisible by the target size then the initial array is buffered with blank bits or the provided bits are repeated, to ensure the algorithm will compute down to the target length. One important characteristic of hashing algorithms is that the hash should not be reversible.

Collisions

A hash is intended to be completely unique to a given set of bytes. However, this is not always the case. Given enough processing effort, an identical hash can be produced using alternative input data. This is known as a hash collision, when two pieces of data produce the same hash. It is a security risk for such collisions to be feasibly created, since original data can be replaced with alternate data, and the hash still indicates that nothing was changed. For this reason, large key sizes are preferred for hashing, since they make it much more difficult to produce a collision. Typically, any collision which is found would only replace valid data with gibberish. However, this is not guaranteed; given enough computing effort, it is possible to replace data in a meaningful way while still matching the hash.
Algorithms such as MD5[1][2] and SHA-1[3] have been "broken" by producing collisions.

Algorithms

MD5

MD5 became a computing standard [4] for hashing but was subsequently found to be insecure[5] and was replaced with SHA1. MD5 breaks the input up into 512-bit blocks and churns it down into a 128-bit bit hash. For example, the text "The quick brown fox jumps over the lazy dog" would become (base64 encoded): 50842ef1bd9a95a284b395a0f0e0e2f8
MD5 was intended to solve a weakness in MD4, although both were actually published as open standards at the same time. For this reason, MD4 was never largely implemented and MD5 became the algorithm of choice.

RIPEMD

RIPEMD (Race Integrity Primitives Evaluation Message Digest)[6] was released in 1992, and based on MD4. This original RIPEMD-128 ran two instances of the MD4 algorithm in parallel. It was therefore less efficient than MD5, and not entirely trusted by many, since MD4 had a weakness which had been known since its public release. A hash collision was found with PIREMD in 2004, and the industry stopped using it. However, RIPEMD-160 (the most popular), RIPEMD-256, are RIPEMD-320 are strengthened (utilizing larger keys) versions of this algorithm which were released in 1996 and are still in use today.

SHA-1

SHA (Secure Hash Algorithm) is a set of functions for hashing which fixed the problems found in MD5 and has since become the industry standard. SHA1 hashes down to a 160-bit hash. For example, the text "The quick brown fox jumps over the lazy dog" would become (base64 encoded): 1305dca26f7ef6e660c5c54948c1050cc3253b18 SHA-1 has been found not to have sufficient complexity to reduce collisions, so it is now disfavored.

SHA-2

SHA-2 (typically SHA-256) is an updated version of SHA-1, which adds complexity to help reduce the chance of hash collisions. "SHA-224," "SHA-384," and "SHA-512" are also forms of SHA-2, which are using other key lengths; the trailing number states the key length in use. The greater the key size, the more difficult it is to have a hash collision, as this longer key size produces a longer hash. For example, the text "The quick brown fox jumps over the lazy dog" using SHA-256 would produce: C8554AB924067605CFEB06DE2EC69D7FAF1648EFBF027273FA4DAB8F7CED086B

Using a 512 bit key (SHA-512), it would produce this much longer hash (spaces added for readability): 07E547D9586F6A73F73FBAC0435ED76951218FB 7D0C8D788A309D785436BBB642E93A252A954F2 3912547D1E8A3B5ED6E1BFD7097821233FA0538 F3DB854FEE6

References

  1. https://blog.avira.com/md5-the-broken-algorithm/
  2. http://cryptocrats.com/crypto/md5-the-hash-algorithm-is-now-broken/
  3. https://www.thesslstore.com/blog/sha-1-collision-created/
  4. http://tools.ietf.org/html/rfc1321
  5. http://merlot.usc.edu/csac-f06/papers/Wang05a.pdf
  6. "RIPEMD." TheFreeDictionary.com. 2019. Farlex, Inc. 15 May. 2019 https://acronyms.thefreedictionary.com/RIPEMD