Hacker News new | past | comments | ask | show | jobs | submit
I found a LOT more value with personal python based API tools once I employed well described JSON schemas.

One of my clients must comply with a cyber risk framework with ~350 security requirements, many of which are so poorly written that misinterpretation is both common and costly.

But there are other, more well-written and described frameworks that include "mappings" between the two frameworks.

In the past I would take one of the vague security requirements, read the mapping to the well described framework to understand the underlying risk, the intent of the question, as well as likely mitigating measures (security controls). On average, that would take between 45-60 minutes per question. Multiply that out it's ~350 * 45 minutes or around 262 hours.

My first attempts to use AI for this yielded results that had some value, but lacked the quality to provide to the client.

On this past weekend, using python, Sonnet 3.5, JSON schemas, I managed to get the entire ~350 questions documented with a quality level exceeding what I could achieve manually.

It cost $10 in API credits and approx 14 hrs of my time (I'm sure a pro could easily achieve this in under 1 hour). The code itself was easy enough, but the big improvements came from the schema descriptions. That was the change that gave me the 'aha' moment.

I read over final results for dangerous errors (but ended up changing nothing at all) but just in case, I ran the results through GPT-4o which also found no issues that would prevent sending it to the client.

I would never get that job done manually, it's simply too much of a grind for a human to do cheaply or reliably.

Have you tried BAML (https://github.com/boundaryml/baml)? It's really good at structured output parsing. We integrated it directly into our pipeline builder.
Not yet, but its the weekend is just beginning, thanks for the tip.
(BAML founder here) feel free to jump on our Discord or email us if you have any issues with BAML! Here's our repo (with docs links) https://github.com/BoundaryML/baml and a demo: https://boundaryml.wistia.com/medias/5fxpquglde

People have used it to do anything from simple classifications to extracting giant schemas.

You are welcome! The easiest way to get started with BAML on Laminar is with our pipeline builder and Structured Output template. Check out the docs here (https://docs.lmnr.ai/pipeline/introduction)