Show HN: Value likelihoods for OpenAI structured output

https://arena-ai.github.io/structured-logprobs/

115ngrislain | 6 days ago | 42 | HN

This looks super valuable!

That said, it's concerning to see the reported probability for getting a 4 on a die roll is 65%.

Hopefully OpenAI isn't that biased at generating die rolls, so is that number actually giving us information about the accuracy of the probability assessments?

loading story #42704846

loading story #42702958

loading story #42699597

loading story #42702767

loading story #42702926

loading story #42709738

loading story #42716693

lyu072825 days ago | parent | next

I was under the impression that log probabilities don't work like that / they aren't really useful to be interpreted as probabilities?

https://news.ycombinator.com/item?id=42684629

> the logits aren't telling you anything like 'what is the probability in a random sample of Internet text of the next token', but are closer to a Bellman value function, expressing the model's belief as to what would be the net reward from picking each possible BPE as an 'action' and then continuing to pick the optimal BPE after that (ie. following its policy until the episode terminates). Because there is usually 1 best action, it tries to put the largest value on that action, and assign very small values to the rest (no matter how plausible each of them might be if you were looking at random Internet text)

loading story #42708640

loading story #42708610

HanClinto6 days ago | parent | next

This is really brilliant stuff! Somehow I didn't realize that logprobs were being returned as part of the OAI requests, and I really like this application of it.

Any interest in seeing this sort of thing being added to llama.cpp?

loading story #42703584

juxtaposicion6 days ago | parent | next

This looks great; very useful for (example) ranking outputs by confidence so you can do human reviews of the not-confident ones.

Any chance we can get Pydantic support?

loading story #42707659

loading story #42703956

Der_Einzige6 days ago | parent | next

BTW - Structured/Constrained Generation is the KEY to making AI agents better/scary good. Without it, you're leaving so much on the table. This library is awesome for augmenting that capability!!!!

Also, if you're "studying LLM based chess" and you don't use dynamic grammar's to enforce that models can only make "valid" moves at each time step, you're research is basically invalid.

And don't meme me with claims that structured/constrained generation harms creativity. The devs of outlines debunked that FUD already: https://blog.dottxt.co/say-what-you-mean.html

Similarly, if you think that RLHF/DPO or Lora or any of that harms creativity, you're really outing yourself as not having played with high temperature sampling.

loading story #42703826

kelsolaar5 days ago | parent | next

I briefly took a look at the code, what is the reason to use Lark and not Python native JSON parser, is it to handle cases where the structured output is not JSON compatible?

loading story #42707887

potatoman226 days ago | parent

How does the token usage compare to vanilla structured output? Many of these libraries do multiple requests to constrain output and measure logprobs.

loading story #42704320

#visit	11561292
#session	45458
#live-session	0