Story Detail of id 41875634 | Liveview Hacker News

josephg8 hours ago | on: C++ proposal: There are exactly 8 bits in a byte

Yep. Pity about getting chars / string encoding wrong though. (Java chars are 16 bits).

But it’s not alone in that mistake. All the languages invented in that era made the same mistake. (C#, JavaScript, etc).

davidgay5 hours ago | parent | next

Java was just unlucky, it standardised it's strings at the wrong time (when Unicode was 16-bit code points): Java was announced in May 1995, and the following comment from the Unicode history wiki page makes it clear what happened: "In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. ..."

jeberle6 hours ago | parent | next

Java strings are byte[]'s if their contents contain only Latin-1 values (the first 256 codepoints of Unicode). This shipped in Java 9.

JEP 254: Compact Strings

https://openjdk.org/jeps/254

paragraft8 hours ago | parent

What's the right way?

WalterBright8 hours ago | root | parent | next

UTF-8

When D was first implemented, circa 2000, it wasn't clear whether UTF-8, UTF-16, or UTF-32 was going to be the winner. So D supported all three.

Remnant448 hours ago | root | parent

utf8, for essentially the reasons mentioned in this manifesto: https://utf8everywhere.org/

josephg7 hours ago | root | parent

Yep. Notably supported by go, python3, rust and swift. And probably all new programming languages created from here on.

#visit	10096899
#session	44451
#live-session	0