Hacker News new | past | comments | ask | show | jobs | submit

  We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that
If you think 20TB "isn't that big" I want to know what size of DBs you're working with 0_0
It's big but it's not so big it wouldn't fit on SSD on one particularly beefy server (two for redundancy). Sharding this would be more about the transaction rate. Actually, sharding would always be about the transaction rate.
loading story #48486455
loading story #48487946
If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.
You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.
For a vast majority of use cases 20TB is positively enormous.
loading story #48478147
loading story #48477771
loading story #48478132
loading story #48477666
loading story #48477722