Hacker News new | past | comments | ask | show | jobs | submit
I am trying to gain a basic understanding of this: Right now I have a 4TB DB on one large box. Is the idea that using a proxy tool like PGDog I could spin up 8 smaller boxes handling ~500GB each and then one medium box for the proxy?

Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.

> 8 smaller boxes handling ~500GB each and then one medium box for the proxy?

That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.

That's the idea of sharding. If you read the pgdog docs, you'll notice you need to tell it which shard server to route your request to - it doesn't just magically work. It's still providing value by reusing connections, which are particularly expensive in postgres.

Because it's not magic, you do still have to know what's going on under the hood, e.g. no cross-shard transactions.

I'd see if my application can benefit from read replicas before doing sharding, because sharding is difficult (if you care about data consistency). With replicas, each replica does have a full copy of the data and you only write to the master - you have to decide which transactions are suitable for running against replicas, which can lag slightly behind realtime. E.g. reading data to build a webpage is probably safe to do from a replica - any read-modify-write is not.

fwiw, we support cross-shard transactions. They are not magic though, just good old 2pc and a bit of coordination.
2pc is only safe if every part of the system has guaranteed uptime, which it never does. Assume that cross-shard transactions only work in the happy case and may result in inconsistent data otherwise.

They also reduce the benefit of sharding, possibly down to worse performance than a non-sharded DB.