Hacker News new | past | comments | ask | show | jobs | submit
And the fact that adding real utf-8 support limited (limits?) the length of strings that can be indexed
Postgres limits btree keys to 2704 bytes, which is actually slightly smaller than MySQL's limit of 3072 bytes, assuming the default InnoDB storage engine.

That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.

For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).

> So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.

This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.

This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.

loading story #41876347
loading story #41876001