Hacker News new | past | comments | ask | show | jobs | submit

C++ proposal: There are 8 bits in a byte

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3477r0.html
loading story #41875832
Previously, in JF's "Can we acknowledge that every real computer works this way?" series: "Signed Integers are Two’s Complement" <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...>
loading story #41875200
During an internship in 1986 I wrote C code for a machine with 10-bit bytes, the BBN C/70. It was a horrible experience, and the existence of the machine in the first place was due to a cosmic accident of the negative kind.
loading story #41874970
loading story #41875248
loading story #41875234
D made a great leap forward with the following:

1. bytes are 8 bits

2. shorts are 16 bits

3. ints are 32 bits

4. longs are 64 bits

5. arithmetic is 2's complement

6. IEEE floating point

and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!

Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.

loading story #41875539
loading story #41875486
Some people are still dealing with DSPs.

https://thephd.dev/conformance-should-mean-something-fputc-a...

Me? I just dabble with documenting an unimplemented "50% more bits per byte than the competition!" 12-bit fantasy console of my own invention - replete with inventions such as "UTF-12" - for shits and giggles.

loading story #41875020
loading story #41875142
Is C++ capable of deprecating or simplifying anything?

Honest question, haven't followed closely. rand() is broken,I;m told unfixable and last I heard still wasn't deprecated.

Is this proposal a test? "Can we even drop support for a solution to a problem literally nobody has?"

loading story #41875009
loading story #41875407
loading story #41875528
loading story #41875032
loading story #41875813
This is both uncontroversial and incredibly spicy. I love it.
loading story #41875647
I'm totally fine with enforcing that int8_t == char == 8-bits, however I'm not sure about spreading the misconception that a byte is 8-bits. A byte with 8-bits is called an octet.

At the same time, a `byte` is already an "alias" for `char` since C++17 anyway[1].

[1] https://en.cppreference.com/w/cpp/types/byte

loading story #41874974
I just put static_assert(CHAR_BITS==8); in one place and move on. Haven't had it fire since it was #if equivalent
Hmm, I wonder if any modern languages can work on computers that use trits instead of bits.

https://en.wikipedia.org/wiki/Ternary_computer

loading story #41875556
Not sure about that, seems pretty controversial to me. Are we forgetting about the UNIVACs?
Hopefully we are; it's been a long time, but as I remember indexing in strings on them is a disaster.
They still exist. You can still run OS 2200 on a Clearpath Dorado.[1] Although it's actually Intel Xeon processors doing an emulation.

Yes, indexing strings of 6-bit FIELDATA characters was a huge headache. UNIVAC had the unfortunate problem of having to settle on a character code in the early 1960s, before ASCII was standardized. At the time, a military 6-bit character set looked like the next big thing. It was better than IBM's code, which mapped to punch card holes and the letters weren't all in one block.

[1] https://www.unisys.com/siteassets/collateral/info-sheets/inf...

idk. by today most software already assumes 8 bit == byte in subtle ways all over the place to a point you kinda have to use a fully custom or at least fully self reviewed and patched stack of C libraries

so delegating such by now very very edge cases to non standard C seems fine, i.e. seems to IMHO not change much at all in practice

and C/C++ compilers are anyway full of non standard extensions and it's not that CHAR_BIT go away or you as a non-standard extension assume it might not be 8

loading story #41875219
loading story #41875416
Do UNIVACs care about modern C++ compilers? Do modern C++ compilers care about UNIVACs?

Given that Wikipedia says UNIVAC was discontinued in 1986 I’m pretty sure the answer is no and no!

loading story #41875217
So please do excuse my ignorance, but is there a "logic" related reason other than hardware cost limitations ala "8 was cheaper than 10 for the same number of memory addresses" that bytes are 8 bits instead of 10? Genuinely curious, as a high-level dev of twenty years, I don't know why 8 was selected.

To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?

loading story #41875211
loading story #41875147
loading story #41875041
loading story #41875110
loading story #41875204
loading story #41875052
But how many bytes are there in a word?
If you're on x86, the answer can be simultaneously 16, 32, and 64.
"Word" is an outdated concept we should try to get rid of.
You're right. To be consistent with bytes we should call it a snack.
loading story #41874949
It's very useful on hardware that is not an x86 CPU.
loading story #41874937
loading story #41874988
How exactly ? How else do you suggest CPUs do addressing ?

Or are you suggesting to increase the size of a byte until it's the same size as a word, and merge both concepts ?

loading story #41874963
loading story #41875413
loading story #41875260
And then we lose communication with Europa Clipper.
This is entertaining and probably a good idea but the justification is very abstract.

Specifically, has there even been a C++ compiler on a system where bytes weren't 8 bits? If so, when was it last updated?

loading story #41875179
loading story #41875156
loading story #41875670
Hoesntly at thought this might be an onion headline. But then I stopped to think about it.
Why? Pls no. We've been told (in school!) that byte is byte. Its only sometimes 8bits long (ok, most of the time these days). Do not destroy the last bits of fun. Is network order little endian too?
loading story #41875209
As a person who designed and built a hobby CPU with a sixteen-bit byte, I’m not sure how I feel about this proposal.
Amazing stuff guys. Bravo.
There are FOUR bits.

Jean-Luc Picard

Incredible things are happening in the C++ community.
But think of ternary computers!
Doesn't matter ternary computers just have ternary bits, 8 of them ;)
Ternary computers have 8 tits to a byte.
loading story #41875195
Supposedly, "bit" is short for "binary digit", so we'd need a separate term for "ternary digit", but I don't wanna go there.
loading story #41875189
loading story #41875125
JF Bastien is a legend for this, haha.

I would be amazed if there's any even remotely relevant code that deals meaningfully with CHAR_BIT != 8 these days.

(... and yes, it's about time.)

loading story #41875024
DSP chips are a common exception that people bring up. I think some TI made ones have 64 bit chars.

Edit: I see TFA mentions them but questions how relevant C++ is in that sort of embedded environment.

Yes, but you're already in specialized territory if you're using that
The tms320c28x DSPs have 16 bit char, so e.g. the Opus audio codec codebase works with 16-bit char (or at least it did at one point -- I wouldn't be shocked if it broke from time to time, since I don't think anyone runs regression tests on such a platform).

For some DSP-ish sort of processors I think it doesn't make sense to have addressability at char level, and the gates to support it would be better spent on better 16 and 32 bit multipliers. ::shrugs::

I feel kind of ambivalent about the standards proposal. We already have fixed size types. If you want/need an exact type, that already exists. The non-fixed size types set minimums and allow platforms to set larger sizes for performance reasons.

Having no fast 8-bit level access is a perfectly reasonable decision for a small DSP.

Might it be better instead to migrate many users of char to (u)int8_t?

The proposed alternative of CHAR_BIT congruent to 0 mod 8 also sounds pretty reasonable, in that it captures the existing non-8-bit char platforms and also the justification for non-8-bit char platforms (that if you're not doing much string processing but instead doing all math processing, the additional hardware for efficient 8 bit access is a total waste).

I thinks it's fine to relegate non 8 bit chars to non-standard C given that a lot of software anyway assumes 8bit bytes already implicitly. Non standard extensions for certain use-cases isn't anything new for C compilers. Also it's a C++ proposal I'm not sure if you program DSPs with C++ :think:
Ignoring this C++ proposal, especially because C and C++ seem like a complete nightmare when it comes to this stuff, I've almost gotten into the habit of treating a "byte" as a conceptual concept. Many serial protocols will often define a "byte", and it might be 7, 8, 9, 11, 12, or whatever bits long.
I wish I knew what a 9 bit byte means.

One fun fact I found the other day: ASCII is 7 bits, but when it was used with punch cards there was an 8th bit to make sure you didn't punch the wrong number of holes. https://rabbit.eng.miami.edu/info/ascii.html

loading story #41874959
the fact that this isn't already done after all these years is one of the reasons why I no longer use C/C++. it takes years and years to get anything done, even the tiniest, most obvious drama free changes. contrast with Go, which has had this since version 1, in 2012:

https://pkg.go.dev/builtin@go1#byte

loading story #41874954
In a char, not in a byte. Byte != char
loading story #41874969
A common programming error in C is reading input as char rather than int.

https://man7.org/linux/man-pages/man3/fgetc.3.html

fgetc(3) and its companions always return character-by-character input as an int, and the reason is that EOF is represented as -1. An unsigned char is unable to represent EOF. If you're using the wrong return value, you'll never detect this condition.

However, if you don't receive an EOF, then it should be perfectly fine to cast the value to unsigned char without loss of precision.