PgDog is funded and coming to a database near you

https://pgdog.dev/blog/our-funding-announcement

526levkk | 1 day ago | 246 | HN

loading story #48488884

> The reason DBs like Mongo or Dynamo exist is because Postgres has a scaling problem.

I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.

levkk1 day ago | parent | next

Good thing we support HA as well: https://docs.pgdog.dev/features/load-balancer/

Load balancer with health checks and failover, works out of the box. :) Battle-tested at this point too, so could be worth a look.

r7n23 hours ago | root | parent | next

I've extensively used Dynamo (internally at Amazon and externally) and even founded a DB startup with it at it's core. Boiling down scalability of Postgres vs Dynamo as it's written in blog is a bit terse. Dynamo scales writes horizontally with the keyspace, forever. Postgres simply can't, and no number of layers between the machines and the developer changes that. Sharding, pooling, Citus are all layered on top of an engine where a given row's writes still land on one primary.

bofaGuy14 hours ago | root | parent | next

Dynamo DB isn’t even good at being a KV store. Almost every time we have to also back it with S3 because of size limitations.

moomoo1111 hours ago | root | parent

did you use single table design?

and yeah you have to spend a lot of upfront time designing your data models

loading story #48481090

loading story #48481174

loading story #48481990

loading story #48479643

doctorpangloss1 day ago | root | parent

Is a load balancer HA?

loading story #48480070

loading story #48480090

dotancohen23 hours ago | root | parent

What happens when the load balancer fails?

loading story #48482068

loading story #48480526

pinkgolem14 hours ago | parent | next

Have you tried cnpg? Worked amazingly well for my usecases

loading story #48480223

loading story #48479529

loading story #48481697

codegeek1 day ago | parent | next

"Why Us" => "I ran Postgres at Instacart, where we scaled the company 5x in April of 2020. The biggest problem we had was making Postgres serve 100,000s of grocery delivery orders per minute"

Couldn't be a better why us :)

loading story #48480376

loading story #48480514

loading story #48480348

loading story #48480818

loading story #48479800

loading story #48479353

chrisvenum1 day ago | parent | next

I am trying to gain a basic understanding of this: Right now I have a 4TB DB on one large box. Is the idea that using a proxy tool like PGDog I could spin up 8 smaller boxes handling ~500GB each and then one medium box for the proxy?

Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.

levkk1 day ago | parent | next

> 8 smaller boxes handling ~500GB each and then one medium box for the proxy?

That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.

inigyou16 hours ago | parent

That's the idea of sharding. If you read the pgdog docs, you'll notice you need to tell it which shard server to route your request to - it doesn't just magically work. It's still providing value by reusing connections, which are particularly expensive in postgres.

Because it's not magic, you do still have to know what's going on under the hood, e.g. no cross-shard transactions.

I'd see if my application can benefit from read replicas before doing sharding, because sharding is difficult (if you care about data consistency). With replicas, each replica does have a full copy of the data and you only write to the master - you have to decide which transactions are suitable for running against replicas, which can lag slightly behind realtime. E.g. reading data to build a webpage is probably safe to do from a replica - any read-modify-write is not.

loading story #48485152

loading story #48489265

yabones1 day ago | parent | next

I'm curious how this might help with our biggest downtime-causer with postgres, which is major version upgrades. Poolers do a great job for failover and load balancing, but we consistently need ~10-20 minutes of downtime once or twice a year to do upgrades. Logical replication between old->new versions could probably help, but it would still require flipping everything over to the new cluster without partial writes or anything silly. Anybody have experience with this?

loading story #48477583

loading story #48477774

tschellenbach1 day ago | parent | next

Logical replication solves this. You roll the cluster, downtime is minimal. like 60s maybe.

loading story #48478502

loading story #48478282

loading story #48477435

tschellenbach1 day ago | parent | next

PgDog, Neki, multigres, awesome to see. And yes this is the main issue with postgres. Well this and not having index hints, looking forward to 19

welder1 day ago | parent | next

Don't forget the original PgBouncer. Hard to setup, but with the help of AI these days it's easier to configure.

paulryanrogers1 day ago | parent

The pg_hint_plan extension isn't in core, yet is pretty competent when you need to override planner.

aejm1 day ago | parent | next

I notice there is an Enterprise Edition, can you please specify which features are not open source? Do you predict new features you add will be ee licensed as a way to pay back your VC funders?

levkk1 day ago | parent

Two big ones:

1. Control plane to manage multi-node deployments; "works out of the box" experience to make PgDog easy to deploy and use

2. QoS (quality of service): automatically block bad queries from taking down the database

Last but not least, you get SLA-backed support from us (up to P0).

New features are broken down into two categories:

1. Sharding / running Postgres at scale: always open source.

2. Infra management / making it easy to run PgDog at scale: enterprise.

loading story #48482062

loading story #48479780

Ozzie_osman1 day ago | parent | next

  We sharded over 20 TB that we know about.

This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that

loading story #48481994

loading story #48478957

loading story #48477967

loading story #48477598

loading story #48489222

mijoharas9 hours ago | parent | next

Congrats on the funding Lev!

Just to say we're happy pgdog users here! One feature we quite like (of the proxy) is the handling of different connection settings per connection (i.e. statement_timeout). When we investigated RDS proxy (ages ago) it wasn't supported, I think the same was true for pgbouncer so it required a bunch of application changes. With pgdog, it just works transparently.

karolist20 hours ago | parent | next

Love PgDog. I don't need it honestly, but using it in my on-prem k8s because I heard about you in Postgres FM podcast randomly when I had nothing to listen to on a hike in the woods and it picked up my interest.

https://open.spotify.com/episode/6qgpfiW68KcvRASs6649Fb

levkk20 hours ago | parent

Thanks!

ParadisoShlee1 day ago | parent | next

I've moved from pgbouncer to pgdog a few months ago without issue. Huge fan.

frollogaston17 hours ago | parent | next

Reminds me of long ago, before Postgres even had things like parallel scan to utilize multiple CPU cores on a single machine, I used to have Python helpers to split up queries by ranges of IDs. If a query was complicated, I'd EXPLAIN it first then pick either the innermost or outermost index scan, and often get a linear speedup. But it was quite manual, required using temp tables for SELECTs, and ofc had no consistency.

kjuulh1 day ago | parent | next

I tried out PgDog a while ago, but couldn't find a good way of handling the config except for having this users / pgdog toml file, which makes it a bit awkward to handle in kubernetes where we often do multi-tenancy in postgres - or rather having many databases on the same instance(s), and have them come and go at will.

Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).

PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.

levkk1 day ago | parent | next

Not the place and not the time, but we are building an enterprise edition that "just works" out of the box. Not saying that the open source experience cannot be better - it always can and we'll keep improving. What you've experienced is definitely a known issue with our specific implementation of passthrough auth. Scram made things a bit harder, since we can't validate user's passwords at login time anymore (that's what makes scram secure fwiw).

We'll get there.

loading story #48477177

loading story #48478994

floriferous10 hours ago | parent | next

Is this comparable to Supabase's just announced multigres?

drchaim1 day ago | parent | next

Good stuff, although I’m not quite sure about the fast OLAP use case.

If you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.

PostgreSQL still needs real columnar tables in the core, hopefully one day

loading story #48486856

levkk1 day ago | parent | next

OLAP means different things to different people. For us, it's just making sure your admin dashboard keeps working basically:

  SELECT tenant_id, COUNT(clicks)
  FROM users
  GROUP BY tenant_id
  ORDER BY 2 DESC
  LIMIT 25;

Performance is a side effect - definitely needed and we'll do everything we can, but we are not competing with ClickHouse or Snowflake - just trying to make sharded Postgres work with your app.

loading story #48479218

gen22023 hours ago | parent | next

Is there an explainer for people who are broadly familiar with the DB space? It sounds like you're building an equivalent to Vitesse for Postgres, but it's not super clear from the article (which I know is not the point of this, but still :) ).

Edit: It also might be interesting to point out how your solution differs from what the folks at Planetscale are building https://planetscale.com/neki

loading story #48480552

mnbbrown1 day ago | parent | next

I've loved using pgdog for the last 6 months. It's been incredibly stable. It's nifty how they've solved the LISTEN/NOTIFY on a transaction pooler problem.

welder1 day ago | parent | next

Three real-world issues I've run into recently with PgBouncer + Postgres are:

1. pool exhaustion from idle connections inside open long-running transactions

2. SQLAlchemy's client-side pool using dead connections that PgBouncer had already killed, causing periodic request errors

3. Some tasks have to bypass PgBouncer when they use SET or prepared statements

I've already sharded large datasets at the application layer, but looks like PgDog solves the above problems for any future work?

loading story #48485023

loading story #48478596

jeremyjh1 day ago | parent | next

> With $5.5M from Basis Set, YC, Pioneer Fund and other great investors, we have years of runway,

This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.

loading story #48479373

htrp1 day ago | parent | next

>PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code.

Still trying to figure out how this works technically, is the performance gain really just re-write in rust?

levkk1 day ago | parent

Not quite. The performance gain is to bring those features to Postgres!

Edit:

Performance gains are from having the ability to load balance reads (horizontal scaling for read queries) and scale out writes (with sharding). Once instance bottleneck in Postgres has many faces:

1. Behind schedule vacuums because of too many dead tuples (too many writes)

2. The WALWriter is single-threaded and IO-bound - Postgres can only do about 200-300MB/sec in writes per instance (real prod numbers on EC2 with NVMes and ZFS, basically best case scenario).

3. Bulkheading: single primary is a single point of failure. With 12 primaries, if one fails, 91% of your customers don't notice.

The list goes on. Rust is just a side effect. We love it because it's fast and correct - the perfect match for a database product.

hylaride1 day ago | root | parent | next

So to oversimplify, is the idea to bring an AWS Aurora-style storage mechanism natively to Postgres?

inigyou16 hours ago | root | parent | next

Aurora is one big database, isn't it? PgDog is just a proxy where you tell it which shard to access.

levkk1 day ago | root | parent

Yes, except it doesn't have any cross-dependencies on the same volume, so the uptime here should be higher.

jeremyjh1 day ago | root | parent

Aurora has a completely different storage backend. PgDog is a front end proxy - each server in the cluster is still using standard Postgres right?

levkk1 day ago | root | parent

Yup!

VeninVidiaVicii1 day ago | root | parent

Oh thanks for clearing that up.

levkk1 day ago | root | parent

Sorry, out walking the dog (not a pun). I'll post more details in a few.

simonw1 day ago | parent | next

Suggestion: have more than just helm and Docker in your quickstart documentation. I'd like to try this out just to see what it can do, but not quite enough to fire up one of those systems for it.

Is there a binary I can run directly?

levkk1 day ago | parent | next

We should add it to brew/apt/etc for sure. Also, we could add it to crates.io so you could do something like `cargo install pgdog`. Distribution, distribution, distribution.

loading story #48477439

loading story #48477391

loading story #48477157

valorzard1 day ago | parent | next

I've seen a couple of these "distributed" postgres extensions.

My question is, has any of them been talked about being upstreamed to postgres itself? Or, adding a custom built in feature to postgres itself?

inigyou7 hours ago | parent | next

It doesn't actually distribute postgres. It lets you use one connection to talk to multiple postgres databases by switching between them and if you're very careful you can sort of see it like a single database, ht it's not really.

levkk1 day ago | parent

This is not an extension, it's a proxy! Very different. You can deploy it anywhere already without having to wait for upstreaming or your cloud provider adding support for it. It's one of the two reasons why we built it this way, the other being performance (it's much faster to do this in the proxy than inside Postgres).

maherbeg1 day ago | parent | next

I'm a big PGDog fan! It really helped us scale our connection proxy needs pretty substantially and it has great features like auto mode to support Aurora failovers neatly. It's infra that just works.

jeremyjh1 day ago | parent | next

It’s surprising they don’t mention advantages over other sharding systems like Citus. Maybe it’s just the fact that it’s only a proxy and not core extensions? But that could limit capabilities.

levkk1 day ago | parent

We do, just buried deep in our blog: https://pgdog.dev/blog/pgdog-vs-citus

The same old processes vs. threads debate, plus having the ability to scale the coordinator past a single machine. So, if you're OLTP, definitely consider PgDog. OLAP - Citus still wins because of its advanced query engine. We'll get there.

loading story #48479207

bourbonproof1 day ago | parent | next

the reason mongo is a joy to use in scaled env is because no additional setup/software needed and all drivers natively support secondary/primary writes/reads and topological changes. so it's end to end, and adding is as a new proxy in frontend of postgres leads to all clients being incompatible or the code itself has no control anymore about when to use a secondary and what allowed stall is acceptable for a particular query. Any solutions to this by pgdog?

saghm1 day ago | parent | next

> all drivers natively support secondary/primary writes/reads and topological changes.

Expanding on that a bit, mongo drivers even have a shared specification of the state machine for monitoring topology changes[1] and algorithm for selecting the server to send an operation to[2] (along with various declarative test cases that the drivers use to validate them alongside the specs in the repo). I think people sometimes underestimate how important the client-side work is to this sort of experience; for all of the faults mongo has had over the years, the amount of investment that they put into the client libraries is something I've never seen anywhere else (although having spent several years working on some of these libraries, my take is likely very biased).

[1]: https://github.com/mongodb/specifications/blob/master/source... [2]: https://github.com/mongodb/specifications/blob/master/source...

loading story #48481001

BowBun22 hours ago | parent | next

I really wish they'd acknowledge the prior art and name that they've taken inspiration from - https://github.com/postgresml/pgcat

Don't pay a startup for your DB proxy, you should own that layer yourself inside of your infrastructure.

xyzzy_plugh21 hours ago | parent | next

The creator of pgdog is also the creator of pgcat, so I think they probably don't need to do this.

levkk21 hours ago | root | parent | next

This reminds me of college. We had to cite our own papers from prior semesters or risk getting kicked out for plagiarism. I don't miss those days :)

loading story #48483518

loading story #48482779

loading story #48482361

loading story #48481955

TurdF3rguson11 hours ago | parent | next

let's say i have a primary with 100M rows of addresses and indexes on things like city, state, zip code (all in memory). I also have 3 read replicas that struggle to do 1000 lookups per minute each. Does PgDog help?

andrey-g14 hours ago | parent | next

How does this compare to Aurora Serverless?

mamcx1 day ago | parent | next

I do tenant per PG schema, most are smallish some are bigger (not much, can do all in a single box) but moving forward eventually will need something like this. Also plan to provide "get your own VPS" for more enterprise customers.

This kind of tool will help in this case?

levkk1 day ago | parent

Yup. We support schema-based sharding: https://docs.pgdog.dev/configuration/pgdog.toml/sharded_sche...

zadikian15 hours ago | parent | next

This is exciting. INSERT (SELECT ...) doesn't work though, right? The docs only mention VALUES inserts.

loading story #48485796

bart3r18 hours ago | parent | next

We are still using Pgpool-II and it's been very solid, but would be interested in moving to PgDog.

Would love to hear the advantages of moving to PgDog.

snihalani20 hours ago | parent | next

I'd love to advocate for PgDog if there were more than 2 managed service providers. Adding a single company with no substitute in your supply chain feels hard

philippemnoel19 hours ago | parent | next

Let's go. Very bullish on PgDog. Lev understands this space better than anyone else. If you are sharding Postgres, you should talk to him.

fulafel1 day ago | parent | next

Does making it "just work" here come with any caveats vs standard PG?

levkk1 day ago | parent | next

Getting there! Cross-shard writes do because of 2pc. Reads are eventually consistent.

loading story #48476878

Wonnk131 day ago | parent | next

I wish them all the best. Supabase, Timescale, etc etc. there's a whole cottage industry of extending postgres to whatever you need.

SamInTheShell22 hours ago | parent | next

Scratching my head. Wondering why I would reach for this over just running a Yugabyte cluster.

sgt10 hours ago | parent | next

Is this like on prem RDS?

redmonduser1 day ago | parent | next

How is this different from Citus?

melon_tsui1 day ago | parent | next

2M qps in production is legit. Curious how much RAM and CPU that takes on average per deployment though

levkk1 day ago | parent

Depends. Only pooling, very little. Load balancing/sharding needs to parse queries, so a bit more. Could go up to a GB per pod, sometimes more if you have a lot of unique SQL queries (unique by text, not by parameters). We cache query ASTs to avoid parsing them on each request - that's the bulk of memory usage.

parthdesai1 day ago | root | parent

Semi related question - I have always wondered, how do you tackle OOM issues at the proxy layer, i.e. let's say a particular SQL query requires proxy to fan out the query to multiple shards, which return a pretty large dataset. I'm assuming you would need to load this dataset in the ram to perform certain operations. What happens if the resulting dataset causes the proxy pod to go OOM?

levkk1 day ago | root | parent

Two schools of thought:

1. Let it crash. Increase the RAM, try again.

2. Page to disk (swap), make it slow but ultimately work.

Both have their trade-offs. There is no free lunch here.

dzonga23 hours ago | parent | next

I us pg. not that I know much about database internals, besides the 'b-tree' stuff we learned in college.

I don't know how the pg scaling story gets fixed unless certain things are rewritten. that's my fear of going all in pg.

mysql has vitess etc & even upgrades are easier. though pg is more extensible.

inigyou7 hours ago | parent

Any strongly consistent database is going to be limited by a single machine's throughput. That's just what you trade for strong consistency. You can shard it yourself but then the DBMS isn't giving you consistency so you'd better be very careful. You can use a tool like PgDog to aid with sharding but it's not doing magic, you still have to be aware how it works and the limitations of sharding.

However 95% of projects are going to be fine with a normal single-machine database and another 4% are going to be well served by upgrading the hell out of that machine. Only the absolute busiest projects actually need a distributed database and you can cross that bridge when you actually get to it.

They say Amazon processes 20k orders per second. That seems not unachievable for postgres with fast SSDs and careful query optimization, though they don't choose do it that way. You're not Amazon, you have at most 20 orders per second and that's nothing.

Pet_Ant1 day ago | parent | next

I hope people pronounce this as „pig-dog” and has a mascot that looks like „man-bear-pig”

levkk1 day ago | parent

Crap! Missed opportunity.

faangguyindia1 day ago | parent | next

i am not using any tool like pgbouncer and have not run into any issues so far. Is it even required these days? Have you guys tested your setup without these connection poolers/multiplexers?

loading story #48477763

moralestapia1 day ago | parent | next

Cool work, thanks.

Wrt. the pooler, how do you compare with pgbouncer?

I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.

For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.

Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.

levkk1 day ago | parent

Old benchmark, but still good: https://pgdog.dev/blog/pgbouncer-vs-pgdog

sandeepkd1 day ago | parent | next

Nit-Pick: It might be anti-marketing, still it would be helpful if the use cases can be articulated in a way where it would make sense to use this Vs any other type of database. Honesty goes a long way with the more technical folks for anything related to infrastructure.

Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.

levkk1 day ago | parent

Always is. Marketing is not our strong suit (only engineers here). We'll get better at it.

loading story #48483851

9999000009991 day ago | parent | next

How are 3 developers going to QA this properly ?

loading story #48477531

loading story #48478624

loading story #48488819

skiwithuge1 day ago | parent | next

we are using PG bouncer in production. Interesting, I will follow the evolution of this project

gregaccount19 hours ago | parent | next

Fix the bad license.

afr0ck20 hours ago | parent | next

Is this vibe-coded?

loading story #48483840

esafak15 hours ago | parent | next

I think sharding is the wrong approach; who wants to mess about with sharding logic? Distributed key-value stores are the way to go. But cockroach already offers that so I suppose you can try the other way.

orliesaurus1 day ago | parent | next

how does it compare to PlanetScale ?

loading story #48479238

octernion22 hours ago | parent | next

congrats, lev! brings back fond memories of database fires.

i'm sure you'll get 100x comments about "why not just have one fast SSD? it can do 2000 trillion writes/s"

levkk21 hours ago | parent

Thanks! Yup...to be expected. If you know, you know, and have the scars to prove it :)

loading story #48488934

xenophonf1 day ago | parent | next

This commit looks... odd.

https://github.com/pgdogdev/pgdog/commit/36434f93f03dec1d7d4...

I want to have as much fun as the next developer, but that makes me worry, what with supply chain attacks in the news and all.

loading story #48483929

levkk1 day ago | parent

I see you met Sage, our newest founding engineer :) If you're not having fun at your job...

In all seriousness, we review every single line of code that goes in and only people who work for PgDog Inc are allowed to merge.

loading story #48488941

RedMagicBox19 hours ago | parent | next

[dead]

exabrial23 hours ago | parent

> The reason DBs like Mongo or Dynamo exist is because

Not quite. The reason "DBs" like those exist is purely due to fashion. Lets not kid ourselves into thinking they do anything better, save the exception of making data hard to access, which might be a project goal in some cases.

inigyou16 hours ago | parent

Dynamo definitely scales better than anything else at the tradeoff of not guaranteeing durability in the case of enough node failures and (like most distributed databases) not allowing interaction between different pieces of data.

#visit	13,742,173
#session	74,665
#live-session	0