MLAudioFeature

class NatML.Features.MLAudioFeature : MLFeature, IMLEdgeFeature, IEnumerable<(MLAudioFeature feature, long timestamp)>

This feature contains raw audio data. Currently, NatML only supports floating-point linear PCM audio data.

Creating the Feature

The audio feature can be created from several different audio inputs:

From an AudioClip

/// <summary>
/// Create an audio feature from an audio clip.
/// </summary>
/// <param name="clip">Audio clip.</param>
/// <param name="duration">Optional duration to extract in seconds.</param>
MLAudioFeature (AudioClip clip, float duration = ...);

The audio feature can be created from an AudioCliparrow-up-right, with the optional ability to specify the duration of the clip to extract.

From a Sample Buffer

/// <summary>
/// Create an audio feature from a sample buffer.
/// </summary>
/// <param name="sampleBuffer">Linear PCM sample buffer.</param>
/// <param name="sampleRate">Sample rate.</param>
/// <param name="channelCount">Channel count.</param>
MLAudioFeature (float[] sampleBuffer, int sampleRate, int channelCount);

The audio feature can be created from a sample buffer in managed memory, along with audio format information.

circle-info

The sample buffer must be linear PCM and interleaved by channel.

From a Native Array

The audio feature can be created from a NativeArray<float> sample buffer, along with audio format information.

circle-exclamation

From a Native Buffer

The audio feature can be created from a sample buffer, along with audio format information.

circle-exclamation

From a Buffer List

The audio feature can be created from an audio buffer list. This is useful for audio-based predictors that make predictions on longer segments of audio data, like speech-to-text models.

circle-info

This constructor will combine each buffer in the list into one contiguous sample buffer. As such, this constructor allocates memory.

Inspecting the Feature

Refer to the Inspecting the Feature section of the MLFeature class for more information.

circle-info

The type is always an MLAudioType .

Audio Preprocessing

The audio feature supports preprocessing when creating an MLEdgeFeature for edge predictions that use raw waveform data:

Sample Rate

For Edge predictors that make predictions on raw audio waveform data, the audio feature can resample audio data to the specified sampleRate.

circle-info

The sampleRate is initialized to that of the audio data used to create the feature.

Channel Count

For Edge predictors that make predictions on raw audio waveform data, the audio feature can multiplex or demultiplex audio data to the specified channelCount.

circle-info

The channelCount is initialized to that of the audio data used to create the feature.

Normalization

When making Edge predictions on audio features, some models might require that input data is normalized to some be within some range. The audio feature provides these properties as an easy way to perform any required normalization.

circle-info

The default range for audio features is [-1.0, 1.0].

When using NatML Hubarrow-up-right, the normalization coefficients can be specified when creating a predictor:

Specifying normalization coefficients on NatML Hub.

The specified normalization coefficients can then be used like so:

Mean

The audio feature supports specifying a normalization mean when creating an MLEdgeFeature.

Standard Deviation

The audio feature supports specifying a normalization standard deviation when creating an MLEdgeFeature.

Creating an Edge Feature

INCOMPLETE.

Last updated