Hacker News new | past | comments | ask | show | jobs | submit
Now you have me wondering what is theoretically the most compact and efficient language, without using compression
Claude Shannon talks about this in A Mathematical Theory of Communication. He defines redundancy as one minus relative entropy, where relative entropy is the ratio of the language's actual average uncertainty per symbol to the maximum possible uncertainty if all alphabet symbols were completely random and equally likely.

He gives some rather cute examples, like the language of Finnegans Wake by Joyce being very low redundancy (high efficiency in your words). He also states that crossword puzzles don't work in a perfectly efficient language, that 50% redundancy is pretty good for 2-d puzzles, and 33% redundancy good for 3-d puzzles. This has always been one of my favorite and in my mind most random corollaries in a paper.

https://people.math.harvard.edu/~ctm/home/text/others/shanno...

I feel like you're going to run up against the definitions of "efficient" and "compression".

For example, a language with a larger alphabet will be able to express more in fewer characters. Is that more efficient?

Similarly, you could think of each word as a sort of lookup table for information in the mind of the reader. We don't define words as we're writing, we expect the speaker to know them already. If a language has more words, each word is more precise, and fewer words can be used to express an idea—but is that efficiency? You're just relying on the reader having more preexisting knowledge.

> a language with a larger alphabet will be able to express more > in fewer characters.

True, although it's not really the alphabet that determines this, it's the number of phonemes (distinctive sounds) in the language. For example, writing /s/ (the sound) sometimes with 's' and sometimes with 'c' does nothing to shorten words in English or Spanish.

But in general, languages with fewer phonemes tend to have longer words (and tone languages often have very short words---in a sense, they have more phonemes than non-tone languages). Morphology (particularly compounding) often obscures this.

It's not a real language and I don't know what "compression" means in this context but I'll throw Ithkuil against the wall and see if it sticks[0,1]

[0]https://en.wikipedia.org/wiki/Ithkuil

[1]https://news.ycombinator.com/item?id=29036441

and now this reminds me of kolmogorov complexity