Trie in Swift - Autocorrect Data Structure: now you know what to blame

Subscribe to my newsletter and never miss my upcoming articles

Hello ladies and gentlemen, Leo here.

Today we'll study a tree like data structure called Trie. It's a special type of tree and the main use in modern world is the core of autocorrect(autocomplete) algorithms. At the end of the article you'll know exactly what to blame when you want to say Duck and it transform to Luck.

Theory-crafting

First thing first, what is a Trie?

In computer science, a trie, also called digital tree or prefix tree, is a kind of search tree —an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings. [...] All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Keys tend to be associated with leaves, though some inner nodes may correspond to keys of interest. Hence, keys are not necessarily associated with every node.

The Trie is have a faster lookup than a imperfect hash map, doesn't has key collision and the main use is to represent string dictionaries.

Let's code!

The problem

Let's suppose that you have to write an autocomplete algorithm. For example: you enter the prefix "car" and the algorithm must return all the stored complete words that have "car" prefix.

First we have to write the TrieNode and the Trie itself structure:

class TrieNode<T: Hashable> { // 1
    var value: T? // 2
    weak var parentNode: TrieNode?
    var children: [T: TrieNode] = [:] // 3
    var isTerminating = false // 4
    var isLeaf: Bool {
        return children.count == 0
    }

    init(value: T? = nil, parentNode: TrieNode? = nil) {
        self.value = value
        self.parentNode = parentNode
    }

    func add(value: T) {
        guard children[value] == nil else {
            return
        }
        children[value] = TrieNode(value: value, parentNode: self)
    }
}


class Trie { //this is the Trie itself, the root node is always empty

    public var count: Int {
        return wordCount
    }

    fileprivate let root: TrieNode<Character>
    fileprivate var wordCount: Int

    init() { // the initialization of the root empty node
        root = TrieNode<Character>()
        wordCount = 0
    }
}
  1. Note that the Node is generic but the Trie isn't
  2. The value of each node (like any other tree) but this is optional, because you need to be able to have an empty root node.
  3. The related characters
  4. This is important because you can have in your Trie a middle node(not leaf) that is also a word termination.

With this two classes we can now start to write our two algorithms: the insert func and search words in subTrie.

The insert func:

extension Trie {

    func insert(word: String) {
        guard !word.isEmpty else {
            return
        }
        var currentNode = root
        for character in word.lowercased() { // 1
            if let childNode = currentNode.children[character] {
                currentNode = childNode
            } else {
                currentNode.add(value: character)
                currentNode = currentNode.children[character]!
            }
        }

        guard !currentNode.isTerminating else { // 2
            return
        }
        wordCount += 1
        currentNode.isTerminating = true
    }
}
  1. The first part we are traversing all the children nodes until the end of the word and if you find the character you just add it just became the new current if not, we just add it to the current node and it became the current node.
  2. This is just to prevent count two times the same word.

And now... it's the time we're all waiting: the func words in sub Trie.

extension Trie {

    func findWordsWithPrefix(prefix: String) -> [String] {
        var words = [String]()
        let prefixLowerCased = prefix.lowercased()
        if let lastNode = findLastNodeOf(word: prefixLowerCased) { //1
            if lastNode.isTerminating { // 1.1
                words.append(prefixLowerCased)
            }
            for childNode in lastNode.children.values { //2
                let childWords = getSubtrieWords(rootNode: childNode, partialWord: prefixLowerCased)
                words += childWords
            }
        }
        return words // 3 
    }

    private func findLastNodeOf(word: String) -> TrieNode<Character>? { // this just check is the prefix exist in the Trie
        var currentNode = root
        for character in word.lowercased() { 
            guard let childNode = currentNode.children[character] else { // traverse the Trie with each of prefix character
                return nil
            }
            currentNode = childNode
        }
        return currentNode
    }
}

We'll start explaining the findWordsWithPrefix:

  1. This traverse the Trie trying to find the last node that match with the prefix and after that(1.1) checks if the prefix itself is a word( isTerminating)
  2. Now we begin traverse the last node children to get all the words that have the prefix.
  3. Return the found words.

And finally the wordsInSubtrie func and it's the heart of all this algorithm so we have to give it a special attention:

extension Trie {
    fileprivate func getSubtrieWords(rootNode: TrieNode<Character>, partialWord: String) -> [String] {
        var subtrieWords = [String]()
        var previousLetters = partialWord
        if let value = rootNode.value { // 1
            previousLetters.append(value)
        }
        if rootNode.isTerminating { //2 
            subtrieWords.append(previousLetters)
        }
        for childNode in rootNode.children.values { //3
            let childWords = getSubtrieWords(rootNode: childNode, partialWord: previousLetters)
            subtrieWords += childWords
        }
        return subtrieWords
    }
}
  1. This check and unwrap if the node has a value ( always should have but it's safe to do the "if let" check here).
  2. Now we have to check if the currentNode is a terminating node, that means it's a word!
  3. And again we traverse the child nodes searching for all the terminating words recursively.

And Voilà we finished the basic structure of the autocorrect/autocomplete algorithm! I hardly recommend to copy and paste all the code to the playgrounds and you can play/study around it. It's a amazing algorithm and you can upgrade it to a way more complex implementation, like giving weights to each node so when returning to the user you can also suggest the most used word or the most "correct" word.

To test it just write:

var trie = Trie()

trie.insert(word: "car")
trie.insert(word: "care")
trie.insert(word: "carrot")
trie.insert(word: "carb")
trie.insert(word: "carburized")
trie.insert(word: "cardboard")
trie.insert(word: "carrousel")
trie.insert(word: "carries")

print(trie.findWordsWithPrefix(prefix: "carb"))
print(trie.findWordsWithPrefix(prefix: "carr"))
print(trie.findWordsWithPrefix(prefix: "car"))
print(trie.findWordsWithPrefix(prefix: "cat"))

And the result should be:

I hope you had a great time discovering/remembering this algorithm like I had, and now when the autocorrect just turn your life into hell we can certainly say: BLAME YOU TRIE!

Thanks for the reading and... That's all folks!

PS: If you want to learn more, the algorithm is well explained in: RayWenderlich and his MUST CHECK repository PS2: The tree in cover image is touristic place in Jericoacoara called "Árvore preguiçosa".

Image credit: Trie image ; cover image

No Comments Yet