C++ proposal: There are 8 bits in a byte

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3477r0.html

145Twirrim | 5 hours ago | 132 | HN

loading story #41875832

Previously, in JF's "Can we acknowledge that every real computer works this way?" series: "Signed Integers are Two’s Complement" <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...>

loading story #41875200

pjdesno3 hours ago | parent | next

During an internship in 1986 I wrote C code for a machine with 10-bit bytes, the BBN C/70. It was a horrible experience, and the existence of the machine in the first place was due to a cosmic accident of the negative kind.

loading story #41874970

loading story #41875248

loading story #41875234

WalterBright2 hours ago | parent | next

D made a great leap forward with the following:

1. bytes are 8 bits

2. shorts are 16 bits

3. ints are 32 bits

4. longs are 64 bits

5. arithmetic is 2's complement

6. IEEE floating point

and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!

Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.

loading story #41875539

loading story #41875486

MaulingMonkey3 hours ago | parent | next

Some people are still dealing with DSPs.

https://thephd.dev/conformance-should-mean-something-fputc-a...

Me? I just dabble with documenting an unimplemented "50% more bits per byte than the competition!" 12-bit fantasy console of my own invention - replete with inventions such as "UTF-12" - for shits and giggles.

loading story #41875020

loading story #41875142

harry83 hours ago | parent | next

Is C++ capable of deprecating or simplifying anything?

Honest question, haven't followed closely. rand() is broken,I;m told unfixable and last I heard still wasn't deprecated.

Is this proposal a test? "Can we even drop support for a solution to a problem literally nobody has?"

loading story #41875009

loading story #41875407

loading story #41875528

loading story #41875032

loading story #41875813

TrueDuality4 hours ago | parent | next

This is both uncontroversial and incredibly spicy. I love it.

loading story #41875647

kreco3 hours ago | parent | next

I'm totally fine with enforcing that int8_t == char == 8-bits, however I'm not sure about spreading the misconception that a byte is 8-bits. A byte with 8-bits is called an octet.

At the same time, a `byte` is already an "alias" for `char` since C++17 anyway[1].

[1] https://en.cppreference.com/w/cpp/types/byte

loading story #41874974

bobmcnamara3 hours ago | parent | next

I just put static_assert(CHAR_BITS==8); in one place and move on. Haven't had it fire since it was #if equivalent

pabs32 hours ago | parent | next

Hmm, I wonder if any modern languages can work on computers that use trits instead of bits.

https://en.wikipedia.org/wiki/Ternary_computer

loading story #41875556

JamesStuff4 hours ago | parent | next

Not sure about that, seems pretty controversial to me. Are we forgetting about the UNIVACs?

trebligdivad4 hours ago | parent | next

Hopefully we are; it's been a long time, but as I remember indexing in strings on them is a disaster.

Animats3 hours ago | root | parent

They still exist. You can still run OS 2200 on a Clearpath Dorado.[1] Although it's actually Intel Xeon processors doing an emulation.

Yes, indexing strings of 6-bit FIELDATA characters was a huge headache. UNIVAC had the unfortunate problem of having to settle on a character code in the early 1960s, before ASCII was standardized. At the time, a military 6-bit character set looked like the next big thing. It was better than IBM's code, which mapped to punch card holes and the letters weren't all in one block.

[1] https://www.unisys.com/siteassets/collateral/info-sheets/inf...

dathinab3 hours ago | parent | next

idk. by today most software already assumes 8 bit == byte in subtle ways all over the place to a point you kinda have to use a fully custom or at least fully self reviewed and patched stack of C libraries

so delegating such by now very very edge cases to non standard C seems fine, i.e. seems to IMHO not change much at all in practice

and C/C++ compilers are anyway full of non standard extensions and it's not that CHAR_BIT go away or you as a non-standard extension assume it might not be 8

loading story #41875219

loading story #41875416

forrestthewoods3 hours ago | parent

Do UNIVACs care about modern C++ compilers? Do modern C++ compilers care about UNIVACs?

Given that Wikipedia says UNIVAC was discontinued in 1986 I’m pretty sure the answer is no and no!

loading story #41875217

donatj3 hours ago | parent | next

So please do excuse my ignorance, but is there a "logic" related reason other than hardware cost limitations ala "8 was cheaper than 10 for the same number of memory addresses" that bytes are 8 bits instead of 10? Genuinely curious, as a high-level dev of twenty years, I don't know why 8 was selected.

To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?

loading story #41875211

loading story #41875147

loading story #41875041

loading story #41875110

loading story #41875204

loading story #41875052

throwaway8899004 hours ago | parent | next

But how many bytes are there in a word?

o11c4 hours ago | parent | next

If you're on x86, the answer can be simultaneously 16, 32, and 64.

wvenable4 hours ago | parent | next

"Word" is an outdated concept we should try to get rid of.

anigbrowl3 hours ago | root | parent | next

You're right. To be consistent with bytes we should call it a snack.

loading story #41874949

pclmulqdq3 hours ago | root | parent | next

It's very useful on hardware that is not an x86 CPU.

loading story #41874937

loading story #41874988

BlueTemplar3 hours ago | root | parent

How exactly ? How else do you suggest CPUs do addressing ?

Or are you suggesting to increase the size of a byte until it's the same size as a word, and merge both concepts ?

loading story #41874963

loading story #41875413

loading story #41875260

aj73 hours ago | parent | next

And then we lose communication with Europa Clipper.

masfuerte3 hours ago | parent | next

This is entertaining and probably a good idea but the justification is very abstract.

Specifically, has there even been a C++ compiler on a system where bytes weren't 8 bits? If so, when was it last updated?

loading story #41875179

loading story #41875156

loading story #41875670

whatsakandr2 hours ago | parent | next

Hoesntly at thought this might be an onion headline. But then I stopped to think about it.

hexo3 hours ago | parent | next

Why? Pls no. We've been told (in school!) that byte is byte. Its only sometimes 8bits long (ok, most of the time these days). Do not destroy the last bits of fun. Is network order little endian too?

loading story #41875209

DowsingSpoon3 hours ago | parent | next

As a person who designed and built a hobby CPU with a sixteen-bit byte, I’m not sure how I feel about this proposal.

gafferongames3 hours ago | parent | next

Amazing stuff guys. Bravo.

starik363 hours ago | parent | next

There are FOUR bits.

Jean-Luc Picard

scosman3 hours ago | parent | next

Bold leadership

adamnemecek3 hours ago | parent | next

Incredible things are happening in the C++ community.

cyberax4 hours ago | parent | next

But think of ternary computers!

dathinab3 hours ago | parent

Doesn't matter ternary computers just have ternary bits, 8 of them ;)

mathgenius3 hours ago | root | parent | next

Ternary computers have 8 tits to a byte.

loading story #41875195

AStonesThrow3 hours ago | root | parent

Supposedly, "bit" is short for "binary digit", so we'd need a separate term for "ternary digit", but I don't wanna go there.

loading story #41875189

loading story #41875125

Quekid54 hours ago | parent | next

JF Bastien is a legend for this, haha.

I would be amazed if there's any even remotely relevant code that deals meaningfully with CHAR_BIT != 8 these days.

(... and yes, it's about time.)

loading story #41875024

shawn_w3 hours ago | parent | next

DSP chips are a common exception that people bring up. I think some TI made ones have 64 bit chars.

Edit: I see TFA mentions them but questions how relevant C++ is in that sort of embedded environment.

Quekid53 hours ago | root | parent

Yes, but you're already in specialized territory if you're using that

nullc3 hours ago | parent

The tms320c28x DSPs have 16 bit char, so e.g. the Opus audio codec codebase works with 16-bit char (or at least it did at one point -- I wouldn't be shocked if it broke from time to time, since I don't think anyone runs regression tests on such a platform).

For some DSP-ish sort of processors I think it doesn't make sense to have addressability at char level, and the gates to support it would be better spent on better 16 and 32 bit multipliers. ::shrugs::

I feel kind of ambivalent about the standards proposal. We already have fixed size types. If you want/need an exact type, that already exists. The non-fixed size types set minimums and allow platforms to set larger sizes for performance reasons.

Having no fast 8-bit level access is a perfectly reasonable decision for a small DSP.

Might it be better instead to migrate many users of char to (u)int8_t?

The proposed alternative of CHAR_BIT congruent to 0 mod 8 also sounds pretty reasonable, in that it captures the existing non-8-bit char platforms and also the justification for non-8-bit char platforms (that if you're not doing much string processing but instead doing all math processing, the additional hardware for efficient 8 bit access is a total waste).

dathinab3 hours ago | root | parent

I thinks it's fine to relegate non 8 bit chars to non-standard C given that a lot of software anyway assumes 8bit bytes already implicitly. Non standard extensions for certain use-cases isn't anything new for C compilers. Also it's a C++ proposal I'm not sure if you program DSPs with C++ :think:

bmitc3 hours ago | parent | next

Ignoring this C++ proposal, especially because C and C++ seem like a complete nightmare when it comes to this stuff, I've almost gotten into the habit of treating a "byte" as a conceptual concept. Many serial protocols will often define a "byte", and it might be 7, 8, 9, 11, 12, or whatever bits long.

AlienRobot3 hours ago | parent | next

I wish I knew what a 9 bit byte means.

One fun fact I found the other day: ASCII is 7 bits, but when it was used with punch cards there was an 8th bit to make sure you didn't punch the wrong number of holes. https://rabbit.eng.miami.edu/info/ascii.html

loading story #41874959

CephalopodMD3 hours ago | parent | next

Obviously

383 hours ago | parent | next

the fact that this isn't already done after all these years is one of the reasons why I no longer use C/C++. it takes years and years to get anything done, even the tiniest, most obvious drama free changes. contrast with Go, which has had this since version 1, in 2012:

https://pkg.go.dev/builtin@go1#byte

loading story #41874954

Iwan-Zotow3 hours ago | parent | next

In a char, not in a byte. Byte != char

loading story #41874969

AStonesThrow3 hours ago | parent

A common programming error in C is reading input as char rather than int.

https://man7.org/linux/man-pages/man3/fgetc.3.html

fgetc(3) and its companions always return character-by-character input as an int, and the reason is that EOF is represented as -1. An unsigned char is unable to represent EOF. If you're using the wrong return value, you'll never detect this condition.

However, if you don't receive an EOF, then it should be perfectly fine to cast the value to unsigned char without loss of precision.

electricdreams4 hours ago | parent

[dead]

#visit	10087772
#session	44449
#live-session	0