Hash Functions in Swift

Cryptography Series: The hash importance

Subscribe to my newsletter and never miss my upcoming articles

Hello Queens and Kings, Leo here.

Today we'll explore one of the basics concepts about cryptography, of course I'm talking about hashing. Hashing is so important that it's baked into Swift Standard Library, because it enables two out of three main swift data structures, the Dictionary and the Set, among other things too. With iOS 13 or later we have CryptoKit to help us with various cryptography methods.

Even you don't using hashing functions directly it's a important concept to know that it's behind the scenes making everything work well.

The painting I choose is Wood splitters, a 1886 painting by the Australian artist Thomas William Roberts (8 March 1856 – 14 September 1931) was an English-born Australian artist and a key member of the Heidelberg School art movement, also known as Australian impressionism. The reason is because the hash word origin from the hatchet French word and as hatchet is a small axe... we have a wood chopping painting.

Stay tuned to the cryptography trip of hashing functions. Let's go!

Problem

You are asked to generate a SHA512 hash of some data to backend.

Let's take a step back and discover what is and why hashing.

Hashing is an algorithm that calculates a fixed-size bit data value from any kind of data input. Hashing transforms this data into a far shorter fixed-length (it can be not fixed length but instead have a target range length too) value or key which represents the original string. In the end of the day, hashing function is a map function that produces a resume of the input data.

Hashing

One main use of hashing is to compare two data for equality. With this technique you can without opening two document compare them bit-for-bit, the calculated hash values of these files will allow the owner to know immediately if they are different.

The properties of hashing functions are:

  1. PreImage Resistance: says that the digest should be hard to revert.
  2. Second preiamge resistance: means given an input and it's hash, it should be computationally hard to find another input with the same hash.
  3. Collision resistance: This states that should be very hard to find two inputs with the same hash.

Collision resistance is specially important in hashing functions because of the birthday problem. This problems states that in a set of n randomly chosen people, some pair of them will have the same birthday. In a group of 23 people, the probability of a shared birthday exceeds 50%, while a group of 70 has a 99.9% chance of a shared birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367, since there are only 366 possible birthdays, including February 29.

This problems relates to hashing because of the birthday attack. The example is: A message m is typically signed by first computing f(m) , where f is a cryptographic hash function, and then using some secret key to sign f(m). Suppose Mallory wants to trick Bob into signing a fraudulent contract. Mallory prepares a fair contract m and a fraudulent one m′. She then finds a number of positions where m can be changed without changing the meaning, such as inserting commas, empty lines, one versus two spaces after a sentence, replacing synonyms etc. By combining these changes, she can create a huge number of variations on m which are all fair contracts. This is way it's important when you change the input data you generate new output data avoiding hash collisions.

As the property stated hash collisions are hard but not impossible. To calculate a SHA256 input to be equal an already given SHA256 output we have to do a "what if" situation here . If we assume that we will use brute force to calculate each one of the 2^128 operations, we could take the entire chip manufacturing capability of the world (currently circa 300+ billion dollars us), and devote the entire output for just one year to making dedicated attack chips, with each chip costing one dollar, and is able to compute 230 hashes per second, well, that gives us an attack in only 300 million years.

So yes, it's breakable but it's unlikely to break in the near future.

Examples

First we'll observe what a hash result (digest is a word used to hash result too) looks like:

let data = "This text will be hashed".data(using: .utf8)! // mark 1 
print(CryptoKit.SHA256.hash(data: data)) // mark 2
print(CryptoKit.SHA384.hash(data: data)) // mark 3
print(CryptoKit.SHA512.hash(data: data))

Screen Shot 2021-05-05 at 08.37.24.png

As you can see in the image above, the digest (hash result) of hashing the 24 characters length text "This text will be hashed" have different outputs based on how big is the hashing digest. As you can see in code example above, all the digest were bigger than the input, that's because all hashing algorithms has a minimum length of result digest. But what happens if the input is bigger than the generated digest? Let's see:

let data = "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec".data(using: .utf8)!
print(CryptoKit.SHA256.hash(data: data),"Count: \(CryptoKit.SHA256.hash(data: data).description.count)")
print(CryptoKit.SHA384.hash(data: data),"Count: \(CryptoKit.SHA384.hash(data: data).description.count)")
print(CryptoKit.SHA512.hash(data: data),"Count: \(CryptoKit.SHA512.hash(data: data).description.count)")

And the result:

Screen Shot 2021-05-05 at 08.40.52.png

This demonstrates one characteristic of hashing: the fixed-size output. Given any kind of input the hashing size is the same (or basically the same). This guarantee an fixed time to search in data structures. You don't have to traverse the whole structure to know where the data is, you only need to calculate the hash that is O(1) operation compared to a O(N) search in data structures like Linked Lists or Arrays.

If you are curious about what SHA2 stands for it's Secure Hash Algorithm 2 is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression function itself built using the Davies–Meyer structure from a specialized block cipher. And SHA2 is what CryptoKit implements in it's hashing algorithms.

Swift Standard Library

Swift is deeply involved with hashing with the Hashable protocol.

Many types in the standard library conform to Hashable: Strings, integers, floating-point and Boolean values, and even sets are hashable by default. Some other types, such as optionals, arrays and ranges automatically become hashable when their type arguments implement the same.

It's important to notice that even with you don't know you are using hashing functions all day long while coding in Swift. Dictionary keys and sets are hashable too, look their declarations in Swift:

@frozen public struct Dictionary<Key, Value> where Key : Hashable {...}

@frozen public struct Set<Element> where Element : Hashable {...}

// and even string

extension String : Hashable {...}

Conclusion

Hashing is a very very important subject to understand and learn even if just to know what it is. Today we learned what a hash is, his properties, how can you use it with CryptoKit (iOS 13+) and the importance of Hashable in Swift Standard Library. This is my second article about security and iOS, the next one we should talk about another cryptography technique broad used, stay tuned!

That's all my people, I hope you liked this as I liked writing. If you want to support this blog you can Buy Me a Coffee or just leave a comment saying hello.

Thanks for the reading and... That's all folks.

credits: image

Interested in reading more such articles from Leonardo Maia Pugliese?

Support the author by donating an amount of your choice.

No Comments Yet