Hacker News new | past | comments | ask | show | jobs | submit
The original tokens have Ġ instead of space. I had this issue too when writing an inference engine for Qwen. You have to "normalize" those special characters.