Story Detail of id 43074230 | Liveview Hacker News

inkyoto6 days ago | on: We were wrong about GPUs

Thanks for the correction, it is Aurora, indeed, although it can be found under the «RDS» console in AWS.

Aurora is actually not a database but is a scalable storage layer that operates over the network and is decoupled from the query engine (compute). The architecture has been used to implement vastly different query engines on top of it (PgSQL, MySQL, DocumentDB – a MongoDB alternative, and Neptune – a property graph database / triple store).

The closest abstraction I can think of to describe Aurora is a VAX/VMS cluster – where the consumer sees a single entity, regardless of size, whilst the scaling (out or back in) remains entirely opaque.

Aurora does not support RDS Proxy for PostgreSQL or its equivalents for other query engine types because it addresses cluster access through cluster endpoints. There are two types of endpoints: one for read-only queries («reader endpoints» in Aurora parlance) and one for read-mutate queries («writer endpoint»). Aurora supports up to 15 reader endpoints, but there can be only one writer endpoint.

Reader endpoints improve the performance of non-mutating queries by distributing the load across read replicas. The default Aurora cluster endpoint always points to the writer instance. Consumers can either default to the writer endpoint for all queries or segregate non-mutating queries to reader endpoints for faster execution.

This behaviour is consistent across all supported query engines, such as PostgreSQL, Neptune, and DocumentDB.

I do not think it is correct to state that Aurora does not use the OS page cache – it does, as there is still a server with an operating system somewhere, despite the «serverless» moniker. In fact, due to its layered distributed architecture, there is now more than one OS page cache, as described in [0].

Since Aurora is only accessible over the network, it introduces unique peculiarities where the standard provisions of storage being local do not apply.

Now, onto the subject of costs. A couple of years ago, an internal client who ran provisioned RDS clusters in three environments (dev, uat, and prod) reached out to me with a request to create infrastructure clones of all three clusters. After analysing their data access patterns, peak times, and other relevant performance metrics, I figured that they did not need provisioned RDS and would benefit from Aurora Serverless instead – which is exactly what they got (unbeknownst to them, which I consider another net positive for Aurora). The dev and uat environments were configured with lower upper ACU's, whilst production had a higher upper ACU configuration, as expected.

Switching to Aurora Serverless resulted in a 30% reduction in the monthly bill for the dev and uat environments right off the bat and nearly a 50% reduction in production costs compared to a provisioned RDS cluster of the same capacity (if we use the upper ACU value as the ceiling). No code changes were required, and the transition was seamless.

Ironically, I have discovered that the AWS cost calculator consistently overestimates the projected costs, and the actual monthly costs are consistently lower. The cost calculator provides a rough estimate, which is highly useful for presenting the solution cost estimate to FinOps or executives. Unintentionally, it also offers an opportunity to revisit the same individuals later and inform them that the actual costs are lower. It is quite amusing.

[0] https://muratbuffalo.blogspot.com/2024/07/understanding-perf...

sgarland6 days ago | parent

> Aurora is actually not a database but is a scalable storage layer that operates over the network and is decoupled from the query engine (compute).

They call it [0] a database engine, and go on to say "Aurora includes a high-performance storage subsystem.":

> "Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL."

To your point re: part of RDS, though, they do say that it's "part of RDS."

> The architecture has been used to implement vastly different query engines on top of it (PgSQL, MySQL, DocumentDB – a MongoDB alternative, and Neptune – a property graph database / triple store).

Do you have a source for this? That's new information to me.

> Aurora does not support RDS Proxy for PostgreSQL

Yes it does [1].

> I do not think it is correct to state that Aurora does not use the OS page cache – it does

It does not [2]:

> "Conversely, in Amazon Aurora PostgreSQL, the default value [for shared_buffers] is derived from the formula SUM(DBInstanceClassMemory/12038, -50003). This difference stems from the fact that Amazon Aurora PostgreSQL does not depend on the operating system for data caching." [emphasis mine]

Even without that explicit statement, you could infer it from the fact that the default value for `effective_cache_size` in Aurora Postgres is the same as that of `shared_buffers`, the formula given above.

> Switching to Aurora Serverless resulted in a 30% reduction in the monthly bill for the dev and uat environments right off the bat

Agreed, for lower-traffic clusters you can probably realize savings by doing this. However, it's also likely that for Dev/Stage/UAT environments, you could achieve the same or greater via an EventBridge rule that starts/stops the cluster such that it isn't running overnight (assuming the company doesn't have a globally distributed workforce).

What bothers me most about Aurora's pricing model is charging for I/O. And yes, I know they have an alternative pricing model that doesn't do so (but the baseline is of course higher); it's the principal of the thing. The amortized cost of wear to disks should be baked into the price. It would be difficult for a skilled DBA with plenty of Linux experience to accurately estimate how many I/O a given query might take. In a vacuum for a cold cache, it's not that bad: estimate or look up statistics for row sizes, determine if any predicates can use an index (and if so, the correlation of the column[s]), estimate index selectivity, if any, confirm expected disk block size vs. Postgres page size, and make an educated guess. If you add any concurrent queries that may be altering the tuples you're viewing, it's now much harder. If you then add a distributed storage layer, which I assume attempts to boxcar data blocks for transmission much like EBS does, it's nearly impossible. Now try doing that if you're a "cloud native" type who hasn't the faintest idea what blktrace [3] is.

[0]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

[1]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

[2]: https://aws.amazon.com/blogs/database/determining-the-optima...

[3]: https://linux.die.net/man/8/blktrace

#visit	12125450
#session	46877
#live-session	0