PgDog is funded and coming to a database near you
https://pgdog.dev/blog/our-funding-announcementI've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.
Load balancer with health checks and failover, works out of the box. :) Battle-tested at this point too, so could be worth a look.
Couldn't be a better why us :)
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.
Because it's not magic, you do still have to know what's going on under the hood, e.g. no cross-shard transactions.
I'd see if my application can benefit from read replicas before doing sharding, because sharding is difficult (if you care about data consistency). With replicas, each replica does have a full copy of the data and you only write to the master - you have to decide which transactions are suitable for running against replicas, which can lag slightly behind realtime. E.g. reading data to build a webpage is probably safe to do from a replica - any read-modify-write is not.
1. Control plane to manage multi-node deployments; "works out of the box" experience to make PgDog easy to deploy and use
2. QoS (quality of service): automatically block bad queries from taking down the database
Last but not least, you get SLA-backed support from us (up to P0).
New features are broken down into two categories:
1. Sharding / running Postgres at scale: always open source.
2. Infra management / making it easy to run PgDog at scale: enterprise.
We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than thatJust to say we're happy pgdog users here! One feature we quite like (of the proxy) is the handling of different connection settings per connection (i.e. statement_timeout). When we investigated RDS proxy (ages ago) it wasn't supported, I think the same was true for pgbouncer so it required a bunch of application changes. With pgdog, it just works transparently.
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
We'll get there.
If you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.
PostgreSQL still needs real columnar tables in the core, hopefully one day
SELECT tenant_id, COUNT(clicks)
FROM users
GROUP BY tenant_id
ORDER BY 2 DESC
LIMIT 25;
Performance is a side effect - definitely needed and we'll do everything we can, but we are not competing with ClickHouse or Snowflake - just trying to make sharded Postgres work with your app.Edit: It also might be interesting to point out how your solution differs from what the folks at Planetscale are building https://planetscale.com/neki
1. pool exhaustion from idle connections inside open long-running transactions
2. SQLAlchemy's client-side pool using dead connections that PgBouncer had already killed, causing periodic request errors
3. Some tasks have to bypass PgBouncer when they use SET or prepared statements
I've already sharded large datasets at the application layer, but looks like PgDog solves the above problems for any future work?
This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
Edit:
Performance gains are from having the ability to load balance reads (horizontal scaling for read queries) and scale out writes (with sharding). Once instance bottleneck in Postgres has many faces:
1. Behind schedule vacuums because of too many dead tuples (too many writes)
2. The WALWriter is single-threaded and IO-bound - Postgres can only do about 200-300MB/sec in writes per instance (real prod numbers on EC2 with NVMes and ZFS, basically best case scenario).
3. Bulkheading: single primary is a single point of failure. With 12 primaries, if one fails, 91% of your customers don't notice.
The list goes on. Rust is just a side effect. We love it because it's fast and correct - the perfect match for a database product.
Is there a binary I can run directly?
My question is, has any of them been talked about being upstreamed to postgres itself? Or, adding a custom built in feature to postgres itself?
The same old processes vs. threads debate, plus having the ability to scale the coordinator past a single machine. So, if you're OLTP, definitely consider PgDog. OLAP - Citus still wins because of its advanced query engine. We'll get there.
Expanding on that a bit, mongo drivers even have a shared specification of the state machine for monitoring topology changes[1] and algorithm for selecting the server to send an operation to[2] (along with various declarative test cases that the drivers use to validate them alongside the specs in the repo). I think people sometimes underestimate how important the client-side work is to this sort of experience; for all of the faults mongo has had over the years, the amount of investment that they put into the client libraries is something I've never seen anywhere else (although having spent several years working on some of these libraries, my take is likely very biased).
[1]: https://github.com/mongodb/specifications/blob/master/source... [2]: https://github.com/mongodb/specifications/blob/master/source...
Don't pay a startup for your DB proxy, you should own that layer yourself inside of your infrastructure.
This kind of tool will help in this case?
Would love to hear the advantages of moving to PgDog.
1. Let it crash. Increase the RAM, try again.
2. Page to disk (swap), make it slow but ultimately work.
Both have their trade-offs. There is no free lunch here.
I don't know how the pg scaling story gets fixed unless certain things are rewritten. that's my fear of going all in pg.
mysql has vitess etc & even upgrades are easier. though pg is more extensible.
However 95% of projects are going to be fine with a normal single-machine database and another 4% are going to be well served by upgrading the hell out of that machine. Only the absolute busiest projects actually need a distributed database and you can cross that bridge when you actually get to it.
They say Amazon processes 20k orders per second. That seems not unachievable for postgres with fast SSDs and careful query optimization, though they don't choose do it that way. You're not Amazon, you have at most 20 orders per second and that's nothing.
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.
https://github.com/pgdogdev/pgdog/commit/36434f93f03dec1d7d4...
I want to have as much fun as the next developer, but that makes me worry, what with supply chain attacks in the news and all.
In all seriousness, we review every single line of code that goes in and only people who work for PgDog Inc are allowed to merge.
Not quite. The reason "DBs" like those exist is purely due to fashion. Lets not kid ourselves into thinking they do anything better, save the exception of making data hard to access, which might be a project goal in some cases.