Hacker News new | past | comments | ask | show | jobs | submit
They created this in service of their video generation model which "clusters and reorders tokens based on semantic similarity using k-means.":

http://arxiv.org/pdf/2505.18875