Hacker News new | past | comments | ask | show | jobs | submit

Show HN: Value likelihoods for OpenAI structured output

https://arena-ai.github.io/structured-logprobs/
This looks super valuable!

That said, it's concerning to see the reported probability for getting a 4 on a die roll is 65%.

Hopefully OpenAI isn't that biased at generating die rolls, so is that number actually giving us information about the accuracy of the probability assessments?

loading story #42704846
loading story #42702958
loading story #42699597
loading story #42702767
loading story #42702926
loading story #42709738
loading story #42716693
I was under the impression that log probabilities don't work like that / they aren't really useful to be interpreted as probabilities?

https://news.ycombinator.com/item?id=42684629

> the logits aren't telling you anything like 'what is the probability in a random sample of Internet text of the next token', but are closer to a Bellman value function, expressing the model's belief as to what would be the net reward from picking each possible BPE as an 'action' and then continuing to pick the optimal BPE after that (ie. following its policy until the episode terminates). Because there is usually 1 best action, it tries to put the largest value on that action, and assign very small values to the rest (no matter how plausible each of them might be if you were looking at random Internet text)

loading story #42708640
loading story #42708610
This is really brilliant stuff! Somehow I didn't realize that logprobs were being returned as part of the OAI requests, and I really like this application of it.

Any interest in seeing this sort of thing being added to llama.cpp?

loading story #42703584
This looks great; very useful for (example) ranking outputs by confidence so you can do human reviews of the not-confident ones.

Any chance we can get Pydantic support?

loading story #42707659
loading story #42703956
BTW - Structured/Constrained Generation is the KEY to making AI agents better/scary good. Without it, you're leaving so much on the table. This library is awesome for augmenting that capability!!!!

Also, if you're "studying LLM based chess" and you don't use dynamic grammar's to enforce that models can only make "valid" moves at each time step, you're research is basically invalid.

And don't meme me with claims that structured/constrained generation harms creativity. The devs of outlines debunked that FUD already: https://blog.dottxt.co/say-what-you-mean.html

Similarly, if you think that RLHF/DPO or Lora or any of that harms creativity, you're really outing yourself as not having played with high temperature sampling.

loading story #42703826
I briefly took a look at the code, what is the reason to use Lark and not Python native JSON parser, is it to handle cases where the structured output is not JSON compatible?
loading story #42707887
How does the token usage compare to vanilla structured output? Many of these libraries do multiple requests to constrain output and measure logprobs.
loading story #42704320