These questions are even not about AI: if I were to give money to a human agency and were given something they tell me works, I would ask the same questions. If I did not know how to evaluate, I would hire people that do. With LLMs the verification part is what bothers me the most.
The only decent software engineering perspective I’ve seen has been from Mitchell Hashimoto.
They can just summon bespoke software out of the ether that only handles the use cases of themselves and a few of their collaborators.
Making “side projects” was mot possible for non-developers before powerful LLMs. Now it is.
Imagine not being an architect and using Claude to put together a building plan, then concluding it’s basically done but we might need a real architect to double check the measurements. It may even be true but I’d be skeptical if it’s always non-architects saying this.
Building in the physical world has physical and time constraints that cannot be overcome, which is one of the reasons architecture (and engineering) are so important in this domain. In software development these constraints were only inherent when people were writing the majority of the software. I feel like I’m seeing what I thought were fundamental constraints being eroded by the increasing speed and correctness of these tools and it’s making me reconsider the importance of some of the values that are held by software engineering.
It’s obviously dependent on the domain and solution, but if your software can be extremely rapidly rearranged, bugs found and fixed with little effort, and features added with only a minimum prompt, I think the entire definition of technical debt has changed. I’ve been sceptical of these tools and still approach their output with caution. I also worry that, as a software developer, if more can be accomplished in less time there will be less room on this planet for software developers.
> With Fable the spell has gotten powerful enough that I am no longer sure I am the wizard. I am closer to a patron. I describe what I want, I pay for it, and I judge the result. The conjuring happens somewhere I cannot watch, in hundreds of small choices I never get a vote on. The work has shifted from process to outcome. I no longer steer; I commission.
have a very different meaning coming from a non-technical researcher than they would from someone who builds software for a living.
The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.
Like, an AI coaching session for executives at the yearly executive retreat. You show up, spend a few hours going through some nonsense slides ChatGPT put together for you, you charge an eye watering fee for it, HR or whoever organizes it will gladly pay for it because it will make them look all cutting edge in front of the CEO, by the next day everyone will forget about it. No accountability at all!
Monoliths vs micro-services.
this doesn't really work in the real world. There are many things that actually matter, engineering is fundamentally about handling them.
I clicked one of his examples intrigued "a snake game where the snake is self-aware and crazy things happen;". Played for 1-2 minutes, and it's the classic 1980s snake game. Am I missing something? What is "self-aware" about it? Some funny messages at the bottom of the screen? And what are the "crazy things"?
I will say, the act of eating creates a "bulge distortion" that flows down the length of the snake is a nice touch though.
the quality of produced code and the medium
A thought I have been tossing around in my head as the models get better is that it really may not matter what the code looks like.If the observed behavior of the software is good, then the software is good. If a bug, of whatever kind, can be fixed by a model on a vibe-coded codebase, then that's a fixable bug. If there are no exploitable vulnerabilities, then the code is secure. If the performance is adequate, then the code is performant.
It simply does not matter what the code looks like if, from the outside, it does what its supposed to, and, from the inside, a model can fix the issue if one is found.
More than ever, software engineering is now really a job about making sure the code is doing what its supposed to.
And even if it DOES matter what the code looks like, you can have a model fix that too.
But all of those correctness are imaginary. The hardware only enforce a few (and it may be buggy). The OS adds some more (and it’s buggy). The compiler/interpreter may have bugs (but that’s rarely a nuisance) and the libraries are often brittle. There are cracks everywhere in the tower of abstractions.
The code has never mattered. What has always mattered is the knowledge of what is the model of correctness of the software (programming as a theory by NauR), so that you can discern where a program is wrong.
The thing is a crash or some other immediate errors are actually nice to have. You get to react immediately and can have a core dump or a stacktrace that points you the error. What is truly a terror is silent corruption (wrong order of operations, wrong values for a comparison that has expanded the idea of correctness, security issues that has been backdoored for years,…).
As Hoare said:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult.
LLM are very much the second kind. You write a lot of complicated code, and then you can no longer reason about their correctness.The lack of downvotes on posts on HN has always felt like more of a bug than a feature to me.
But yes, you are right - I don't build roads and don't know what is a price to build a road and how to determine the quality of correctly built one, nor I will ever care or learn.
In fact, that's the entire reason we care about "quality code", because we assume that quality code is code that does what you expect well and consistently.
I say this as someone who hand writes code pretty much every night for fun, just to experiment with computation. Which, oddly, is more fun than ever because I don't feel like there's any need to connect this type of programming with "real world software", and I can really enjoy code for it's own sake, meanwhile my job is mostly just running agent loops (which I quite like as well).
Yet, I can't deny the reality that I observe working with LLMs every day. If this truly is a step-function (as some are sgguesting), then I have absolutely zero concern for the quality of the code.