Story Detail of id 48397936 | Liveview Hacker News

yencabulator7 hours ago | on: They’re made out of weights

A tokenizer is roughly and approximately Huffman-coding sequences of input (bytes of English etc) into shorter sequences (list of tokens), as a performance optimization.

As you said, it's not in any way intrinsic to the LLM, though it may be a very necessary optimization on today's hardware.

phire6 hours ago | parent

I wouldn't use the word necessary.

IMO, we are probably talking about a 6x slow down (for typical english). You would need to be absolutely stupid not to implement some kind of optimisation along these lines.

Slower and maybe a little dumber; But it would work.

#visit	13,564,128
#session	74,665
#live-session	0