This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
And why is a 768 character limit woefully bad, but a 2704 character limit is totally fine?
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
> even just being able to sort user-submitted strings up to a kilobyte
As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.