Story Detail of id 43112729 | Liveview Hacker News

yorwba1 day ago | on: Softmax forever, or why I like softmax

In statistical mechanics, fixing the average weight has significance, since the average weight i.e. average energy determines the total energy of a large collection of identical systems, and hence is macroscopically observable.

But in machine learning, it has no significance at all. In particular, to fix the average weight, you need to vary the temperature depending on the individual weights, but machine learning practicioners typically fix the temperature instead, so that the average weight varies wildly.

So softmax weights (logits) are just one particular way to parameterize a categorical distribution, and there's nothing precluding another parameterization from working just as well or better.

loading story #43112788

#visit	12093852
#session	46810
#live-session	0