Miscellaneous scripts
This repository contains miscellaneous scripts that does not fit in one repository, yet I will use them sometimes for my personal use. Note that some of the scripts might contain hardcoded paths and opinionated presets, and you are advised to inspect them before actually using.
|
Functions | |
bool | contains_chinese (str text) |
None | dump_long_token (int n=-1) |
Variables | |
enc = tiktoken.get_encoding("o200k_base") |
bool dirty_tokens.contains_chinese | ( | str | text | ) |
Check if the input text contains any Chinese characters. Returns True if at least one Chinese character is found.
Definition at line 8 of file dirty_tokens.py.
Referenced by dump_long_token().
None dirty_tokens.dump_long_token | ( | int | n = -1 | ) |
Definition at line 15 of file dirty_tokens.py.
References contains_chinese(), and dump_long_token().
Referenced by dump_long_token().
dirty_tokens.enc = tiktoken.get_encoding("o200k_base") |
Definition at line 5 of file dirty_tokens.py.