Story Detail of id 47350637 | Liveview Hacker News

ladberg10 hours ago | on: Big data on the cheapest MacBook

I'm curious - what were you doing that polars was leaving a 40-80x speedup on the table? I've been happy with it's speed when held correctly, but it's certainly easy to hold it incorrectly and kill your perf if you're not careful

__mharrison__9 hours ago | parent | next

20 year old BI app. Columnar DBs weren't really a thing. (MonetDB was brand new but not super stable. I committed the SQLAlchemy interface to it.)

10 hours ago | parent | next

{"deleted":true,"id":47350738,"parent":47350637,"time":1773324538,"type":"comment"}

devnotes7710 hours ago | parent | next

Polars is fastest when you avoid eager eval mid-pipeline. If you see a 40x gap it's often from calling .collect() inside a loop or applying Python UDFs row-wise.

loading story #47352071

dartharva9 hours ago | parent

Might be tangential but in my recent experience polars kept crashing the python server with OOM errors whenever I tried to stream data from and into large parquet files with some basic grouping and aggregation.

Claude suggested to just use DuckDB instead and indeed, it made short work of it.

#visit	13,082,148
#session	74,665
#live-session	0