FlashTokenizer is a high-performance tokenizer implementation in C++ of the BertTokenizer used for LLM inference, providing fast and accurate tokenization for large language model applications.
Provides high-speed and accurate tokenization for Bert Tokenizer applications, being faster than most common tokenizers.
Supports parallel processing at the C++ level using OPENMP, achieving faster results in multi-threaded environments.
Compatible with multiple operating systems, including Windows, MacOS, and Ubuntu, and can be integrated into Python via pybind11.