Story Detail of id 47453136 | Liveview Hacker News

wood_spirit9 hours ago | on: Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Does this have corresponding speed ups or memory gains for normal CPUs too? Just thinking about all the cups of coffee that have been made and drunk while scikit-learn kmeans chugs through a notebook :)

snovv_crash8 hours ago | parent | next

For CPU with bigger K you would put the centroids in a search tree, so take advantage of the sparsity, while a GPU would calculate the full NxK distance matrix. So from my understanding the bottleneck they are fixing doesn't show up on CPU.

xavxav8 hours ago | root | parent

search trees tend not to scale well to higher dimensions though, right?

from what I've seen I had the impression that Yinyang k-means was the best way to take advantage of the sparsity.

snovv_crash3 hours ago | root | parent

Most data I've used is for geospatial with D<=4 (xyzt) so for me search trees worked great. But for things like descriptor or embedding clustering yes, trees wouldn't be useful.

openclaw018 hours ago | parent

[dead]

#visit	13,198,081
#session	74,665
#live-session	0