Softmax forever, or why I like softmax
https://kyunghyuncho.me/softmax-forever-or-why-i-like-softmax/I'm all for Graham's pyramid of disagreement: we should focus on the core argument, rather than superfluous things like tone, or character, or capitalisation.
But this is too much for me personally. I just realised I consider the complete lack of capitalisation on a piece of public intellectual work to be obnoxious. Sorry, it's impractical, distracting and generates unnecessary cognitive load for everyone else.
You're the top comment right now, and it's not about the content of the article at all, which is a real shame. All the wasted thought cycles across so many people :(
It's the new black turtleneck that everyone is wearing, but will swear upon their mother's life isn't because they're copying Steve Jobs.
wasn't aware that this makes me a steve jobs copier :(
EDIT: people are seriously so emotionally invested in capitalization that i get downvoted into minus, jeez.
does it make my comment so hard to read just because i don't start my sentences with big letters and don't capitalize myself(i)? really don't get the fuzz.
of course i capitalize letters in "official" texts, but we're in a comment section.
i find it doubly funny because english doesn't capitalize lots of things, anyways.
I find it weird that you would be surprised that people care about the quality of textual communication
I know this is true but does anyone understand why they do it? It is actually cognitively disruptive when reading content because many of us are trained to simultaneously proof read while reading.
So I also consider it a type of cognitive attack vector and it annoys me extremely as well.
I'm a bit confused about this. Do people turn off auto capitalisation on their phones? I very rarely have to press shift on my phone
Using the chat/IM style outside of that context just doesn't work and looks really odd, like it's obviously someone who didn't learn those norms and is now mimicking them without understanding them.
I 100% agree lowercase in longform essays is ridiculous, but I think for everything aside from essays, articles, papers, long emails, and some percentage of multi-paragraph site comments, lowercase is absolutely going to be the default online in 20 years.
That’s already the only stuff worth reading and always has been. No loss then
call me old-fasahioned, but two spaces after a period will solve this problem if people insist on all-lower-case. this also helps distinguish between abbreviations such as st. martin's and the ends of sentences.
i'll bet that the linguistics experimentalists have metrics that quantify reading speed measurements as determined by eye tracking experiments, and can verify this.
( or alternatively use nested sexp to delineate paragraphs, square brackets for parentheticals [( this turned out to be an utterly cursed idea, for the record )] )
You appear to be trolling for the sake of trolling, but for reference: reading speed is determined by familiarity with the style of the text. Diverging from whatever people are used to will make them slower.
There is no such thing as "two spaces" in HTML, so good luck with that.
Code point 160 followed by 32. In other words ` ` will do it.
edit: well I tried to give an example, but hn seems to replace it with regular space. Here's a copy paste version: https://unicode-explorer.com/c/3000
I'll likely continue using Capitalization as a preference and that we use it to express conventions in programming, but I totally understand the movement to drop it and frankly its logical enough.
This is a merely showing off your personal style which, when writing a technical article, I don't care about.
Interestingly programming is the one place where I ditch it almost entirely (at least in my personal code bases).
In contrast, softmax has a very deep grounding in statistical physics - where it is called the Boltzmann distribution. In fact, this connection between statistical physics and machine learning was so fundamental that it was a key part of the 2024 Nobel Prize in Physics awarded to Hopfield and Hinton.
Thermodynamics can absolutely be studied through both a statistical mechanics and an information theory lens, and many physicists have found this to be quite productive and enlightening. Especially when it gets to tricky cases involving entropy, like Maxwell's Demon and Landauer's Eraser, one struggles not to do so.
Note: I am the author
The author gives a really clean explanation for why that’s hard for a network to learn, starting from first principles.
In particular, the assumption that |a_k| ≈ 0 initially is incorrect, since in the original paper https://arxiv.org/abs/2502.01628 the a_k are distances from one vector to multiple other vectors, and they're unlikely to be initialized in such a way that the distance is anywhere close to zero. So while the gradient divergence near 0 could certainly be a problem, it doesn't have to be as fatal as the author seems to think it is.
But in machine learning, it has no significance at all. In particular, to fix the average weight, you need to vary the temperature depending on the individual weights, but machine learning practicioners typically fix the temperature instead, so that the average weight varies wildly.
So softmax weights (logits) are just one particular way to parameterize a categorical distribution, and there's nothing precluding another parameterization from working just as well or better.
But it's 2025, and HTML and Word and the APA and MLA and basically everyone agree that times and style guides have changed.
I agree that not capitalizing the first letter in a sentence is a step too far.
For a counter-example, I personally don't care whether they use the proper em-dash, en-dash, or hyphen--I don't even know when or how to insert the right one with my keyboard. I'm sure there are enthusiasts who care very deeply about using the right ones, and feel that my lack of concern for using the right dash is lazy and unrefined. Culture is changing as more and more communication happens on phone touchscreens, and I have to ask myself - am I out of touch? No, it's the children who are wrong. /s
But I strongly disagree that the author should pass everything they write through Grammarly or worse, through ChatGPT.
/s