NatML
Search…
BERTBasicTokenizer
class NatSuite.MLX.Tokenizers.BERTBasicTokenizer : ITokenizer
This tokenizer performs basic white space and punctuation tokenization for BERT and DistilBERT natural language models.
This class is part of the NatMLX extension library.

Creating the Tokenizer

1
/// <summary>
2
/// Create the basic tokenizer.
3
/// </summary>
4
/// <param name="lowercase">Lowercase all tokens.</param>
5
BERTBasicTokenizer (bool lowercase = true);
Copied!
INCOMPLETE.

Tokenizing Text

1
/// <summary>
2
/// Tokenize a piece of text into its BERT tokens.
3
/// </summary>
4
/// <param name="text">Input text.</param>
5
/// <returns>BERT tokens.</returns>
6
string[] Tokenize (string text);
Copied!
Refer to the Tokenizing Text section of the ITokenizer interface for more information.
Last modified 2mo ago