Use SwiftUI and Machine Learning To Identify Nearly 1,000 Objects With Your Camera

Daily Coding Tip 084

May 09, 2022

Download the MobileNetV2 model from Apple’s CoreML website, and add it to a blank new iOS app project. This is a model that has already been trained to identify 999 objects which is, as the title suggests, nearly 1,000 objects.

Add an NSCameraUsageDescription row to your Info.plist.

This string could be anything but, if you release your app on the App Store, it needs to be an explanation of your need for camera access that Apple agrees with.

Most of the logic here involves extensions that make Apple’s frameworks easier to use. We’ll start by making it easy to create an AVCaptureVideoPreviewLayer as a sublayer of any UIView.

I have added a way to create an AVCaptureDeviceInput that uses the most obvious device, the rear camera of an iPhone.

Here’s another extension that makes it easier to work with AVCaptureSession.

It’s not really necessary to know what’s going on here, but we’re basically tidying away complex logic that won’t be taking up space in our DataModel or SwiftUI views.

The final extensions make it possible to load a CoreML machine learning model from file, like the MobileNetV2 model you should have already downloaded from Apple’s CoreML website and added to your project.

I have added a debug string here, which makes it easier to format the data that I will be displaying in my views.

Now we finally have our DataModel class.

This acts as the delegate for capturing the output from the camera and giving it to the Vision framework, which deals with comparing what it sees to the machine learning model. We are constructing a VNRequest by passing it a closure. That allows us to set the debugString in the DataModel to the debugString computed property that our extension added to VNClassificationObservation.

This is essentially a single result, as I am only concerned with the machine learning model’s best guess.

Now I’m ready to add the UI, which is made up of a simple ZStack of the camera feed and an overlay.

The overlay simply displays the debugString on a rounded background, telling you the level of confidence the model has and the objects it thinks it sees.

Type Safely

Discussion about this post

Type Safely

Use SwiftUI and Machine Learning To Identify Nearly 1,000 Objects With Your Camera

Daily Coding Tip 084

Last Week’s Daily Coding Tips

Discussion about this post