When I teach, I use "big data" for data that won't fit in a single machine. "Small data" fits on a single machine in memory and medium data on disk.
Having said that duckDB is awesome. I recently ported a 20 year old Python app to modern Python. I made the backend swappable, polars or duckdb. Got a 40-80x speed improvement. Took 2 days.
loading story #47357331
loading story #47353338
I'm curious - what were you doing that polars was leaving a 40-80x speedup on the table? I've been happy with it's speed when held correctly, but it's certainly easy to hold it incorrectly and kill your perf if you're not careful
loading story #47352042
loading story #47350738
loading story #47350732
loading story #47352012