NatML
Search…
⌃K

Vision Predictors

Making ML predictions on images and videos.
Computer vision is perhaps the most common use case for machine learning. ML allows for solving some interesting vision tasks, like image classification, detection, segmentation, and so on. Across all of these tasks, the NatML workflow is relatively uniform:

Creating a Vision Predictor

First, we fetch a vision model:
// Fetch model data from NatML
var modelData = await MLModelData.FromHub("@natsuite/ssd-lite");
// Deserialize the model
var model = modelData.Deserialize();
Vision models typically consume an image, and produce one or more outputs depending on what the model does:
// Inspect the model
Debug.Log(model);
// Prints:
// MLEdgeModel
// Input: input.1: (1, 3, 300, 300) System.Single <-- INPUT IMAGE
// Output: 836: (1, 3000, 21) System.Single
// Output: 863: (1, 3000, 4) System.Single
Then we instantiate the predictor class for the model:
// Create the SSD Lite predictor
var predictor = new SSDLitePredictor(model, modelData.labels);
When using models from NatML Hub, you can download the package to get the predictor class implementation.
If you are using your own model file, you will have to write a predictor class implementation for it. See MLEdgeModel for more details on this.
You will always want to fetch the model data, deserialize the model, and create the predictor once, usually in your Start method.

Making Vision Predictions

To make a prediction, we first need to create an image feature which the model will make the prediction on:
// Create an image feature
Texture2D image = ...;
var imageFeature = new MLImageFeature(image);
// Apply normalization
(imageFeature.mean, imageFeature.std) = modelData.normalization;
// Apply aspect mode
imageFeatutre.aspectMode = modelData.aspectMode;
See MLImageFeature for all the different ways to create an image feature from existing images or image data.
It is crucial to apply the proper normalization and aspect mode to the image feature, as many vision models will produce wrong outputs without the proper settings.
We can now make a prediction on the image feature:
// Make a prediction on the image feature
var results = predictor.Predict(imageFeature);
Some vision predictors might require multiple inputs. See the README of your predictor for what features to pass in.

Predictions on Videos

INCOMPLETE.