Hacker News new | past | comments | ask | show | jobs | submit
"accurate document extraction is becoming a commodity with powerful VLMs"

Agree.

The capability is fairly trivial for orgs with decent technical talent. The tech / processes all look similar:

User uploads file --> Azure prebuilt-layout returns .MD --> prompt + .MD + schema set to LLM --> JSON returned. Do whatever you want with it.

Totally agree that this is becoming the standard "reference architecture" for this kind of pipeline. The only thing that complicates this a lot today is complex inputs. For simple 1-2 page PDFs what you describes works quite well out of the box but for 100+ page doc it starts to fall over in ways I described in another comment.
loading story #42068732
Why all those steps? Why not just file + prompt to JSON directly?
loading story #42070089