FlashTokenizer is a high-performance tokenizer implementation in C++ of the BertTokenizer used for LLM inference, providing fast and accurate tokenization for large language model applications.

Features

Efficient Tokenization

Provides high-speed and accurate tokenization for Bert Tokenizer applications, being faster than most common tokenizers.

Parallel Processing Support

Supports parallel processing at the C++ level using OPENMP, achieving faster results in multi-threaded environments.

Cross-Platform Compatibility

Compatible with multiple operating systems, including Windows, MacOS, and Ubuntu, and can be integrated into Python via pybind11.