"accurate document extraction is becoming a commodity with powerful VLMs"
Agree.
The capability is fairly trivial for orgs with decent technical talent. The tech / processes all look similar:
User uploads file --> Azure prebuilt-layout returns .MD --> prompt + .MD + schema set to LLM --> JSON returned. Do whatever you want with it.
Totally agree that this is becoming the standard "reference architecture" for this kind of pipeline. The only thing that complicates this a lot today is complex inputs. For simple 1-2 page PDFs what you describes works quite well out of the box but for 100+ page doc it starts to fall over in ways I described in another comment.
loading story #42068732