Story Detail of id 41873813 | Liveview Hacker News

oulipo8 hours ago | on: Pg_parquet: An extension to connect Postgres and parquet

Cool, would this be better than using a clickhouse / duckdb extension that reads postgres and saves to Parquet?

What would be recommended to output regularly old data to S3 as parquet file? To use a cron job which launches a second Postgres process connecting to the database and extracting the data, or using the regular database instance? doesn't that slow down the instance too much?

craigkerstiens7 hours ago | parent

This alone wouldn't be a full replacement. We do have a full product that does that with customers seeing great performance in production. Crunchy Bridge for Analytics does similar by embedding DuckDB inside Postgres, though for users is largely an implementation detail. We support iceberg as well and have a lot more coming basically to allow for seamless analytics on Postgres building on what Postgres is good at, iceberg for storage, and duckdb for vectorized execution.

That isn't fully open source at this time but has been production grade for some time. This was one piece that makes getting to that easier for folks and felt a good standalone bit to open source and share with the broader community. We can also see where this by itself for certain use cases makes sense, as you sort of point out if you had time series partitioned data, leveraged partman for new partitions and pg_cron which this same set of people authored you could automatically archive old partitions to parquet but still have thing for analysis if needed.

#visit	10090662
#session	44449
#live-session	0