Faster Superword Tokenization
arXiv:2604.05192v1 Announce Type: new Abstract: Byte Pair Encoding (BPE) is a widely used tokenization algorithm, whose tokens cannot extend across pre-tokenization boundaries, functionally limiting it …
Craig W. Schmidt, Chris Tanner, Yuval Pinter
12 views