Story Detail of id 42067560 | Liveview Hacker News

tlofreso8 hours ago | on: Launch HN: Midship (YC S24) – Turn PDFs, docs, and images into usable data

"accurate document extraction is becoming a commodity with powerful VLMs"

Agree.

The capability is fairly trivial for orgs with decent technical talent. The tech / processes all look similar:

User uploads file --> Azure prebuilt-layout returns .MD --> prompt + .MD + schema set to LLM --> JSON returned. Do whatever you want with it.

kietay7 hours ago | parent | next

Totally agree that this is becoming the standard "reference architecture" for this kind of pipeline. The only thing that complicates this a lot today is complex inputs. For simple 1-2 page PDFs what you describes works quite well out of the box but for 100+ page doc it starts to fall over in ways I described in another comment.

loading story #42068732

Kiro6 hours ago | parent

Why all those steps? Why not just file + prompt to JSON directly?

loading story #42070089

#visit	10446631
#session	44657
#live-session	1