Function pre_tokenize

pub fn pre_tokenize(text: &str, do_lower_case: bool) -> Vec<String>

Expand description

Pre-tokenize a text string into word-level tokens.

Applies lowercasing, accent stripping, CJK splitting, whitespace splitting, and punctuation splitting.

pre_tokenize