Holy Swift

Holy Swift

Sound Stream Analysis Using AI in Swift

Sound Stream Analysis Using AI in Swift

Hey listen!

Subscribe to my newsletter and never miss my upcoming articles

Hallo vrienden, Leo hier.

In iOS 15 Apple launch a build in sound classification tool with a pre trained model. It's amazing that from now on all developers of the world will have a built in tool that can guess what type of sound the iPhone is capturing.

When we talk about AI, one of the steps to get any result is to create a very good model. The model will be used to compare results and to achieve level of confidences within the analysis. This is why the new API is so handy. Apple provided us with a already trained model with more than 200+ sound classes. This is a big shortcut for anyone trying to capture sounds and make great interaction with it.

Let's code and check how easy is to implement this but first...

The Painting

This is a 1910 masterpiece called Listening to the Gramophone of Vladimir Yegorovich Makovsky. He lived from 1846 to 1920, was a Russian painter, art collector, and teacher. Makovsky was the son of collector, Egor Ivanovich Makovsky, who was one of the founders of the Moscow Art School. Vladimir studied at the Moscow School of Painting, Sculpture, and Architecture. He finished his studies in 1869 and the following year became one of the founding members of the Association of Travelling Art Exhibitions, where his many years of prolific work brought him to a leading position. His work was defined by a perpetual humor as well as blatant irony and scorn.

I choose this painting because the gramophone. We are talking about iPhone listen to us it seems very suitable something that looks like a big ear, isn't?

The Problem

You were assigned to use iPhone to listen stream of sounds and classify them.

So the first thing is to read all the docs and the Apple Tutorial. And it's very important to notice that the motivation of this post is this WWDC21 talk about sound analysis. Apple as always bring content free to all developers to inspire and push forward new ideas.

Our goal here is to listen and classify stream of sounds. First download the Xcode 13+ and start a new iOS project with storyboards.

As you will listen to user microphone you must add a description for the NSMicrophoneUsageDescription key in your project’s Info.plist file. As the image show below with the message "please let me use the mic":

Screenshot 2021-08-17 at 08.25.34.png

After that info.plist setup, you must import in your ViewController this frameworks:

import AVFAudio
import SoundAnalysis

The imports here are pretty obvious. One to use the mic and listen the stream of sound, and the other to analyse the stream captured.

Now create this properties in the ViewController:

    private let audioEngine: AVAudioEngine = AVAudioEngine() // Mark 1
    private let inputBus: AVAudioNodeBus = AVAudioNodeBus(0) // Mark 2
    private var inputFormat: AVAudioFormat!
    private var streamAnalyzer: SNAudioStreamAnalyzer!
    private let resultsObserver = SoundResultsObserver() // Mark 3
    private let analysisQueue = DispatchQueue(label: "com.example.AnalysisQueue")

On Mark 1 we are creating the audioEngine that will be used to generate and process audio signals and perform audio input and output. Mark 2 we are just creating a bus where the audio can be processed. And on Mark 3 we'll create a class that receives the sound analysis.

You don't have the SoundResultsObserver yet so let's create it in another file.

Create a file called SoundResultsObserver.swift and copy paste the code below:

import Foundation
import SoundAnalysis

class SoundResultsObserver: NSObject, SNResultsObserving {

    func request(_ request: SNRequest, didProduce result: SNResult) { // Mark 1

        guard let result = result as? SNClassificationResult else  { return } // Mark 2

        guard let classification = result.classifications.first else { return } // Mark 3

        let timeInSeconds = result.timeRange.start.seconds // Mark 4

        let formattedTime = String(format: "%.2f", timeInSeconds)
        print("Analysis result for audio at time: \(formattedTime)")

        let confidence = classification.confidence * 100.0
        let percentString = String(format: "%.2f%%", confidence)

        print("\(classification.identifier): \(percentString) confidence.\n") // Mark 5
    }

    func request(_ request: SNRequest, didFailWithError error: Error) {
        print("The the analysis failed: \(error.localizedDescription)")
    }

    func requestDidComplete(_ request: SNRequest) {
        print("The request completed successfully!")
    }
}

The mark 1 is the callback that will receive the sounds already classified. Of course problems can happen and no sound been classified but we have the guard statements for that!

The mark 2 is our checkpoint to get the classified results and in mark 3 we only get the first from the list of SNClassification objects because it is the most assertive guess. You may look into other SNClassification objects if you need deeper look on everything that is captured by the microphone.

The mark 4 explains how you can get the time of the capture inside the stream. And the Mark 5 is a example that you can identify some sound with a certain confidence level. This observer class will be responsible to receive the results of sound analysis.

So now the project build and run again we can go back to the ViewController and add some code to the view didLoad. Replace your viewDidLoad for this:

    override func viewDidLoad() {
        super.viewDidLoad()

        inputFormat = audioEngine.inputNode.inputFormat(forBus: inputBus) // Mark 1

        do {
            try audioEngine.start() // Mark 2

            audioEngine.inputNode.installTap(onBus: inputBus,
                                             bufferSize: 8192,
                                             format: inputFormat, block: analyzeAudio(buffer:at:)) // Mark 3

            streamAnalyzer = SNAudioStreamAnalyzer(format: inputFormat) // Mark 4

            let request = try SNClassifySoundRequest(classifierIdentifier: SNClassifierIdentifier.version1) // Mark 5

            try streamAnalyzer.add(request,
                                   withObserver: resultsObserver) // Mark 6


        } catch {
            print("Unable to start AVAudioEngine: \(error.localizedDescription)")
        }

    }

The code is not building but wait a minute. Let's explain what's going on first:

Mark 1 - Returns the input format for the specified bus. We need to know the audio format to be able to handle it in the future.

Mark 2 - Literally starts the audio engine.

Mark 3 - Installs an audio tap on the bus to record, monitor, and observe the output of the node. It receives the inputBus, the requested size of the incoming buffers, the inputFormat and a closure to handle the buffer listened.

Mark 4 - An audio file analyzer generates an SNResult each time any of its active requests recognizes a sound. With this analyser you can run an SNRequest on an audio stream, that's our main goal.

Mark 5 - This is where the magic happens. You can a request that classifies sound using a Core ML model. The big thing here is that Apple has a out of the box solution for common sounds, the SNClassifierIdentifier.version1 is model already embedded in iOS 15 for use! How cool is that!?

Mark 6 - Now you have to try to add the request to the streamAnalyzer giving the resultsObserver as callback to it.

And finally add this function in the View Controller:

    func analyzeAudio(buffer: AVAudioBuffer, at time: AVAudioTime) {
        analysisQueue.async {
            self.streamAnalyzer.analyze(buffer,
                                        atAudioFramePosition: time.sampleTime)
        }
    }

We made it async to not block the UI or any other user input. But you can make without the async queue too.

Now build and run in an iOS 15+ simulator/device and you should observe something look like this in console:

Screenshot 2021-08-18 at 08.24.44.png

That's it! We finished the project!

Summary

Today we explore how apple has embedded a ML Model with various sounds that you can use to analyse sounds right out of the box in iOS 15. It's amazing all the possibilities that we can do with this kind of technology!

That's all! I hope you liked as I enjoyed write this article. If you want to support this blog you can Buy Me a Coffee or just leave a comment saying hello. You can also sponsor posts and I'm open to writing freelancing! Just reach me in LinkedIn or Twitter for details.

Thanks for the reading and... That's all folks.

Credits: image

Interested in reading more such articles from Leonardo Maia Pugliese?

Support the author by donating an amount of your choice.

 
Share this