I'm curious - what were you doing that polars was leaving a 40-80x speedup on the table? I've been happy with it's speed when held correctly, but it's certainly easy to hold it incorrectly and kill your perf if you're not careful
20 year old BI app. Columnar DBs weren't really a thing. (MonetDB was brand new but not super stable. I committed the SQLAlchemy interface to it.)
{"deleted":true,"id":47350738,"parent":47350637,"time":1773324538,"type":"comment"}
Polars is fastest when you avoid eager eval mid-pipeline. If you see a 40x gap it's often from calling .collect() inside a loop or applying Python UDFs row-wise.
loading story #47352071
Might be tangential but in my recent experience polars kept crashing the python server with OOM errors whenever I tried to stream data from and into large parquet files with some basic grouping and aggregation.
Claude suggested to just use DuckDB instead and indeed, it made short work of it.