Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
This bothers me. I completely understand the conversational aspect - "what approach might work for this?", "how could we reduce the crud in this function?" - it worked a lot for me last year when I tried learning C.
But the vast majority of AI use that I see is...not that. It's just glorified, very expensive search. We are willing to burn far, far more fuel than necessary because we've decided we can't be bothered with traditional search.
A lot of enterprise software is poorly cobbled together using stackoverflow gathered code as it is. It's part of the reason why MS Teams makes your laptop run so hot. We've decided that power-inefficient software is the best approach. Now we want to amplify that effect by burning more fuel to get the same answers, but from an LLM.
It's frustrating. It should be snowing where I am now, but it's not. Because we want to frivolously chase false convenience and burn gallons and gallons of fuel to do it. LLM usage is a part of that.
In other words, it's a skill issue. LLMs can only make this worse. Hiring unskilled programmers and giving them a machine for generating garbage isn't the way. Instead, train them, and reject low quality work.
I don't think finding such programmers is really difficult. What is difficult is finding such people if you expect them to be docile to incompetent managers and other incompetent people involved in the project who, for example, got their position not by merit and competence, but by playing political games.
In my opinion the reason we get enterprise spaghetti is largely due to requirement issues and scope creep. It's nearly impossible to create a streamlined system without knowing what it should look like. And once the system gets to a certain size, it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary. Plus the full requirements are usually poorly documented and long forgotten by that time.
Without redoing their work or finding a way to have deep trust (which is possible, but uncommon at a bigcorp) it's hard enough to tell who is earnest and who is faking it (or buying their own baloney) when it comes to propositions like "investing in this piece of tech debt will pay off big time"
As a result, if managers tend to believe such plans, bad ideas drive out good and you end up investing in a tech debt proposal that just wastes time. Burned managers therefore cope by undervaluing any such proposals and preferring the crappy car that at least you know is crappy over the car that allegedly has a brand new 0 mile motor on it but you have no way of distinguishing from a car with a rolled back odometer. They take the locally optimal path because it's the best they can do.
It's taken me 15 years of working in the field and thinking about this to figure it out.
The only way out is an organization where everyone is trusted and competent and is worthy of trust, which again, hard to do at most random bigcorps.
This is my current theory anyway. It's sad, but I think it kind of makes sense.
The way I explain this to managers is that software development is unlike most work. If I'm making widgets and I fuck up, that widget goes out the door never to be seen again. But in software, today's outputs are tomorrow's raw materials. You can trade quality for speed in the very short term at the cost of future productivity, so you're really trading speed for speed.
I should add, though, that one can do the rigorous thinking before or after the doing, and ideally one should do both. That was the key insight behind Martin Fowler's "Refactoring: Improving the Design of Existing Code". Think up front if you can, but the best designs are based on the most information, and there's a lot of information that is not available until later in a project. So you'll want to think as information comes in and adjust designs as you go.
That's something an LLM absolutely can't do, because it doesn't have access to that flow of information and it can't think about where the system should be going.
This is an important point. I don't remember where I read it, but someone said something similar about taking a loss on your first few customers as an early stage startup--basically, the idea is you're buying information about how well or poorly your product meets a need.
Where it goes wrong is if you choose not to act on that information.
> ...
> train them, and reject low quality work.
I agree very strongly with both of these points.
But I've observed a truth about each of them over the last decade-plus of building software.
1) very few people approach the field of software engineering with anything remotely resembling rigor, and
2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)
I don't know where this takes us as an industry? But I feel your comment on a deep level.
Ahh, well, in order to save money, training is done via an online class with multiple choice questions, or, if your company is like mine and really committed to making sure that you know they take your training seriously, they put portions of a generic book on 'tech Z' in pdf spread spread over a drm ridden web pages.
As for code, that is reviewed, commented and rejected by llms as well. It is used to be turtles. Now it truly is llms all the way down.
That said, in a sane world, this is what should be happening for a company that actually wants to get good results over time .
There is no incentive to do it. I worked that way, focused on quality and testing and none of my changes blew up in production. My manager opined that this approach is too slow and that it was ok to have minor breakages as long as they are fixed soon. When things break though, it's blame game all around. Loads of hypocrisy.
Traditional search (at least on the web) is dying. The entire edifice is drowning under a rapidly rising tide of spam and scam sites. No one, including Google, knows what to do about it so we're punting on the whole project and hoping AI will swoop in like deus ex machina and save the day.
Google wasn’t crushed by spam, they decided to stop doing text search and build search bubbles that are user specific, location-specific, decided to surface pages that mention search terms in metadata instead of in text users might read, etc. Oh yeah, and about a decade before LLMs were actually usable, they started to sabotage simple substring searches and kind of force this more conversational interface. That’s when simple search terms stopped working very well, and you had to instead ask yourself “hmm how would a very old person or a small child phrase this question for a magic oracle”
This is how we get stuff like: Did you mean “when did Shakespeare die near my location”? If anyone at google cared more about quality than printing money, that thirsty gambit would at least be at the bottom of the page instead of the top.
Google results are polluted with spam because it is more profitable for Google. This is a conscious decision they made five years ago.
Then why are DuckDuckGo results also (arguably even more so) polluted with spam/scam sites? I doubt DDG is making any profit from those sites since Google essentially owns the display ad business.
Google not only has multiple monopolies, but a cut and dry perverse incentive to produce lower quality results to make the whole session longer instead of short and effective.
For example: a beginner developer is possibly better served by some SEO-heavy tutorial blog post; an experienced developer would prefer results weighted towards the official docs, the project’s bug tracker and mailing list, etc. But since less technical and non-technical people vastly outnumber highly technical people, Google and Bing end up focusing on the needs of the former, at the cost of making search worse for the later.
One positive about AI: if an AI is doing the search, it likely wants the more advanced material not the more beginner-focused one. It can take more advanced material and simplify it for the benefit of less experienced users. It is (I suspect) less likely to make mistakes if you ask it to simplify the more advanced material than if you just gave it more beginner-oriented material instead. So if AI starts to replace humans as the main clients of search, that may reverse some of the pressure to “dumb it down”.
That's not my experience at all. While there are scammy sites, using the search engines as an index instead of an oracle still yields useful results. It only requires to learn the keywords which you can do by reading the relevant materials .
The problem with Google search is that it indexes all the web, and there's (as you say) a rising tide of scam and spam sites.
The problem with AI is that it scoops up all the web as training data, and there's a rising tide of scam and spam sites.
Tailoring/retraining the main search AI will be so much more expensive that retraining the spam special purpose AIs.
You make this claim with such confidence, but what is it based on?
There have always been hordes of spam and scam websites. Can you point to anything that actually indicates that the ratio is now getting worse?
No, there haven't always been hordes of spam and scam websites. I remember the web of the 90s. When Google first arrived on the scene every site on the results page was a real site, not a spam/scam site.
I'm sure they can. But they have no incentive. Try to Google an item, and it will show you a perfect match of sponsored ads and some other not-so-relevant non-sponsored results
Honestly, I think it will become a better Intellisense but not much more. I'm a little excited because there's going to be so many people buying into this, generating so much bad code/bad architecture/etc. that will inevitably need someone to fix after the hype dies down and the rug is pulled, that I think there will continue to be employment opportunities.
We also have the ceremonial layers of certain forms of corporate architecture, where nothing actually happens, but the steps must exist to match the holy box, box cylinder architecture. Ceremonial input massaging here, ceremonial data transformation over there, duplicated error checking... if it's easy for the LLM to do, maybe we shouldn't be doing it everywhere in the first place.
When I hear that most code is trivial, I think of this as a language design or a framework related issue making things harder than they should be.
Throwing AI or generates at the problem just to claim that they fixed it is just frustrating.
This was one of my thoughts too. If the pain of using bad frameworks and clunky languages can be mitigated by AI, it seems like the popular but ugly/verbose languages will win out since there's almost no point to better designed languages/framework. I would rather a good language/framework/etc where it is just as easy to just write the code directly. Similar time in implementation to a LLM prompt, but more deterministic.
If people don't feel the pain of AI slop why move to greener pastures? It almost encourages things to not improve at the code level.
Just as an example, I have "service" functions. They're incredibly simple, a higher order function where I can inject the DB handler, user permissions, config, etc. Every time I write one of these I have to import the ServiceDependencies type and declare which dependencies I need to write the service. I now spend close to zero time doing that and all my time focusing on the service logic. I don't see a downside to this.
Most of my business logic is done in raw SQL, which can be complex, but the autocomplete often helps there too. It's not helping me figure out the logic, it's simply cutting down on my typing. I don't know how anyone could be offered "do you want to have type significantly less characters on your keyboard to get the same thing done?" and say "no thanks". The AI is almost NEVER coding for me, it's just typing for me and it's awesome.
I don't care how lean your system is, there will at least be repetition in how you declare things. There will be imports, there will be dependencies. You can remove 90% of this repetitive work for almost no cost...
I've tried to use ChatGPT to "code for me", and I agree with you that it's not a good option if you're trying to do anything remotely complex and want to avoid bugs. I never do this. But integrated code suggestions (with Supermaven, NOT CoPilot) are incredibly beneficial and maybe you should just try it instead of trying to come up with theoretical arguments. I was also a non-believer once.
I've been using LLMs for about a month now. It's a nice productivity gain. You do have to read generated code and understand it. Another useful strategy is pasting a buggy function and ask for revisions.
I think most programmers who claim that LLMs aren't useful are reacting emotionally. They don't want LLMs to be useful because, in their eyes, that would lower the status of programming. This is a silly insecurity: ultimately programmers are useful because they can think formally better than most people. For the forseeable future, there's going to be massive demand for that, and people who can do it will be high status.
I don't think that's true. Most programmers I speak to have been keen to try it out and reap some benefits.
The almost universal experience has been that it works for trivial problems, starts injecting mistakes for harder problems and goes completely off the rails for anything really difficult.
I’ve been seeing the complete opposite. So it’s out there.
That's a bold statement, and incorrect, in my opinion.
At a junior level software development can be about churning out trivial code in a previously defined box. I don't think its fair to call that 'most programming'.
Even with the debugging example, if I just read what I wrote I'll find the bug because I understand the language. For more complex bugs, I'd have to feed the LLM a large fraction of my codebase and at that point we're exceeding the level of understanding these things can have.
I would be pretty happy to see an AI that can do effective code reviews, but until that point I probably won't bother.
As an example of not being production ready: I recently tried to use ChatGPT-4 to provide me with a script to manage my gmail labels. The APIs for these are all online, I didn't want to read them. ChatGPT-4 gave me a workable PoC that was extremely slow because it was using inefficient APIs. It then lied to me about better APIs existing and I realized that when reading the docs. The "vibes" outcome of this is that it can produce working slop code. For the curious I discuss this in more specific detail at: https://er4hn.info/blog/2024.10.26-gmail-labels/#using-ai-to...
I think revealing the domain each programmer works in and asking in hose domains would reveal obvious trends. I imagine if you work in Web that you'll get workable enough AI gen code, but something like High Performance computing would get slop worse than copying and lasting the first result on Stackoverflow.
A model is only as good as its learning set, and not all types are code are readily able to be indexable.
I think that’s exactly right. I used to have to create the puzzle pieces and then fit them together. Now, a lot of the time something else makes the piece and I’m just doing the fitting together part. Whether there will come a day when we just need to describe the completed puzzle remains to be seen.
(Or if you’re being paid to waste time, maybe consider coding in assembly?)
So don’t be afraid. Learn to use the tools. They’re not magic, so stop expecting that. It’s like anything else, good at some things and not others.
LLMs are great at translating already-rigorously-thought-out pseudocode requirements, into a specific (non-esoteric) programming language, with calls to (popular) libraries/APIs of that language. They might make little mistakes — but so can human developers. If you're good at catching little mistakes, then this can still be faster!
For a concrete example of what I mean:
I hardly ever code in JavaScript; I'm mostly a backend developer. But sometimes I want to quickly fix a problem with our frontend that's preventing end-to-end testing; or I want to add a proof-of-concept frontend half to a new backend feature, to demonstrate to the frontend devs by example the way the frontend should be using the new API endpoint.
Now, I can sit down with a JS syntax + browser-DOM API cheat-sheet, and probably, eventually write correct code that doesn't accidentally e.g. incorrectly reject reject zero or empty strings because they're "false-y", or incorrectly interpolate the literal string "null" into a template string, or incorrectly try to call Element.setAttribute with a boolean true instead of an empty string (or any of JS's other thousand warts.) And I can do that because I have written some JS, and have been bitten by those things, just enough times now to recognize those JS code smells when I see them when reviewing code.
But just because I can recognize bad JS code, doesn't mean that I can instantly conjure to mind whole blocks of JS code that do everything right and avoid all those pitfalls. I know "the right way" exists, and I've probably even used it before, and I would know it if I saw it... but it's not "on the tip of my tongue" like it would be for languages I'm more familiar with. I'd probably need to look it up, or check-and-test in a REPL, or look at some other code in the codebase to verify how it's done.
With an LLM, though, I can just tell it the pseudocode (or equivalent code in a language I know better), get an initial attempt at the JS version of it out, immediately see whether it passes the "sniff test"; and if it doesn't, iterate just by pointing out my concerns in plain English — which will either result in code updated to solve the problem, or an explanation of why my concern isn't relevant. (Which, in the latter case, is a learning opportunity — but one to follow up in non-LLM sources.)
The product of this iteration process is basically the same JS code I would have written myself — the same code I wanted to write myself, but didn't remember exactly "how it went." But I didn't have to spend any time dredging my memory for "how it went." The LLM handled that part.
I would liken this to the difference between asking someone who knows anatomy but only ever does sculpture, to draw (rather than sculpt) someone's face; vs sitting the sculptor in front of a professional illustrator (who also knows anatomy), and having the sculptor describe the person's face to the illustrator in anatomical terms, with the sketch being iteratively improved through conversation and observation. The illustrator won't perfectly understand the requirements of the sculptor immediately — but the illustrator is still a lot more fluent in the medium than the sculptor is; and both parties have all the required knowledge of the domain (anatomy) to communicate efficiently about the sculptor's vision. So it still goes faster!
They don't have high status even today, imagine in a world where they will be seen as just reviewers for AI code...
Try putting on a dating website that you work at Google vs you work in agriculture and tell us which yielded more dates.
With so many hits, it's about hitting all the checkmarks instead of minmaxing on one check.
Surely yes.
I (not at Google) rarely use the LLM for anything more than two lines at a time, but it writes/autocompletes 25% of my code no problem.
I believe Google have character-level telemetry for measuring things like this, so they can easily count it in a way that can be called "writing 25% of the code".
Having plenty of "trivial code" isn't an indictment of the organisation. Every codebase has parts that are straightforward.