Story Detail of id 43079536 | Liveview Hacker News

sgarland6 days ago | on: We were wrong about GPUs

> Aurora is actually not a database but is a scalable storage layer that operates over the network and is decoupled from the query engine (compute).

They call it [0] a database engine, and go on to say "Aurora includes a high-performance storage subsystem.":

> "Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL."

To your point re: part of RDS, though, they do say that it's "part of RDS."

> The architecture has been used to implement vastly different query engines on top of it (PgSQL, MySQL, DocumentDB – a MongoDB alternative, and Neptune – a property graph database / triple store).

Do you have a source for this? That's new information to me.

> Aurora does not support RDS Proxy for PostgreSQL

Yes it does [1].

> I do not think it is correct to state that Aurora does not use the OS page cache – it does

It does not [2]:

> "Conversely, in Amazon Aurora PostgreSQL, the default value [for shared_buffers] is derived from the formula SUM(DBInstanceClassMemory/12038, -50003). This difference stems from the fact that Amazon Aurora PostgreSQL does not depend on the operating system for data caching." [emphasis mine]

Even without that explicit statement, you could infer it from the fact that the default value for `effective_cache_size` in Aurora Postgres is the same as that of `shared_buffers`, the formula given above.

> Switching to Aurora Serverless resulted in a 30% reduction in the monthly bill for the dev and uat environments right off the bat

Agreed, for lower-traffic clusters you can probably realize savings by doing this. However, it's also likely that for Dev/Stage/UAT environments, you could achieve the same or greater via an EventBridge rule that starts/stops the cluster such that it isn't running overnight (assuming the company doesn't have a globally distributed workforce).

What bothers me most about Aurora's pricing model is charging for I/O. And yes, I know they have an alternative pricing model that doesn't do so (but the baseline is of course higher); it's the principal of the thing. The amortized cost of wear to disks should be baked into the price. It would be difficult for a skilled DBA with plenty of Linux experience to accurately estimate how many I/O a given query might take. In a vacuum for a cold cache, it's not that bad: estimate or look up statistics for row sizes, determine if any predicates can use an index (and if so, the correlation of the column[s]), estimate index selectivity, if any, confirm expected disk block size vs. Postgres page size, and make an educated guess. If you add any concurrent queries that may be altering the tuples you're viewing, it's now much harder. If you then add a distributed storage layer, which I assume attempts to boxcar data blocks for transmission much like EBS does, it's nearly impossible. Now try doing that if you're a "cloud native" type who hasn't the faintest idea what blktrace [3] is.

[0]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

[1]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

[2]: https://aws.amazon.com/blogs/database/determining-the-optima...

[3]: https://linux.die.net/man/8/blktrace

#visit	12125253
#session	46877
#live-session	0