This single headline perfectly captures what I have been thinking. It's not that I reject AI content, but it takes _effort_ to review and weed out any mistakes. When your thoughtful reviews that take an hour(because the PR is typically large, and you want to be _right_ when you're pointing out a hallucination) gets an AI-generated response with AI-generated amendments, It doesn't feel _nice_. I feel dismissed and it has continuously trained me to subconsciously avoid his PRs. After all, the team is fully onboarded with AI, so it's not like there is a lack of PRs to review.
It looks like the sentiment isn't just isolated for me.
And the exact same things you would need to safely give up on PRs for human developers (auto-formatters, linters, comprehensive end-to-end tests, continuous deployment pipelines, etc), are also things that place meaningful guardrails on LLMs, and help them maintain a reasonable quality bar.
If that's genuinely your attitude then your org has a problem.
Code review is slow and less fun, for the average sw eng. But for high quality work it's indispensable. So treat code reviews as a scarce resource. Optimize for code reviewer time and attention. Have your PRs the right size? Are they well described? Do you give context? Do they fit in the bigger story? Do you mix in unrelated drive-by fixes? How easy is it to deal with you once you have received comments? Do you address them promptly? Do you give your reviewers credit (if not praise) for their help? Do you give back by doing code reviews yourself with high quality feedback? There are lot of things you can do to streamline things and give code reviews the place in a teams workflow that it deserves.
And that's not rare in teams. Lots of teams and developers do code review wrong.
I even hear other people complain that I "block" their code review. I mean, if there are issues in your code, of course I am going to flag them, what do you think the purpose of code review is?
In this sense, I'm not sure I've ever seen a team that does codereview "right".
In the before times, most PR feedback was stylistic, with the occasional bug identified. Now that we have ubiquitous auto-formatters/linters/CI, most PR review falls into either "you misunderstood the spec", or "I disagree with your architectural choices" - and my personal feeling is that your process ought to catch both of those well before the PR stage
The argument here is that all code reviews are done with attention and care, but quality of a code review is highly dependent on the reviewer and the team’s review process, and in the real world the quality of reviews pretty much follow the same distribution curve as, say, agile project management: For the time invested in reviewing, a handful of teams get excellent utility from them, most teams get little benefit, and a sad few actually cause harm.
If most code reviews provide only a little benefit at base for most teams, recommending that most teams should also delay shipping quality work is going to sound a lot like bad advice.
I’ve noticed that large PRs aren’t just a problem for human reviewers: they’re a problem for AI reviewers too.
If I submit a 100 line PR I’m likely to get some useful comments back from both humans and LLMs. In fact the LLM is likely to come back with so much feedback it gets down to the nitpicky/annoying level.
If I submit 1000+ lines in my PR, the humans either don’t have time and/or get scrolling blindness, and the AI reviewer is likely to give me a response that amounts to, “<<slaps roof>> Looks good to me bro: ship it!”
I guess they have a limited token budget for reviews so you can bamboozle them simply by blowing most or all of that budget.
I think the ultimate answer to this is a stacked PR workflow (which we had at Meta), where I can cheaply maintain/review a 1,000 line PR as a stack of 10 incremental PRs. But unfortunately GitHub et al are still not quite there on this one.
It is definitely very grunt-like for an LLM.
Code reviews, documentation, static analysis, only retrieving deps from internal repos, unit tests, integration tests, ....
Especially in domains where shipping software is not the main product, and a plain cost center to the main business of physical goods.
That said, sometimes low-trust environments are the issue, not PRs. In a higher trust environment, PR review is a helpful thing you usually desire, not dread.
Respectfully, in a high-trust environment, feedback should be delivered well before the PR stage. If you've let someone write a whole bunch of code without having a shared understanding of how the solution should work, you may have earlier process issues that PRs are papering over
With human reviewers, I find that by the time someone has churned out enough of a solution to post a PR, they are already quite invested in specifics of the solution, and it makes it emotionally costly (to both author and reviewer) when someone says "hey, I'm not a fan of this whole approach, lets start over and do it this other way"
If you have these parties review each other's code, I agree that rarely brings much value.
I think the best way to understand our experience with reviews is to stop and say: in a few sentences, what do you expect out of a quality code review? (sounds like nothing in your case, but I am curious)
For bigger changes, of course you need feedback on designs. But that could easily be in the form of draft PRs.
I definitely would push back on anything that required feedback before PRs. That's way too much process. Just going to slow you down for no benefit.
I've worked with people who consider themselves 'prolific humans'. Someone always has to tidy upp later, and its never them
I run both infrastructure and security - that means a lot of relatively self-contained PRs to infrastructure-as-code and dependency management systems. I'm also the team lead, which makes me responsible for a lot of throwaway prototyping, as well as cleaning up anyone else's mess...
Yes, the prolific-but-damaging engineers are all too common in corporate. But particularly in startup land, you tend to find your high-performers wearing a lot of hats at once.
So in essence you have one guy working at 4x and e.g. four other getting just 0.7x - net effect is still positive, but everyone save for that one person is miserable.
Mind you, the 4x dev doesn't necessarily have to be particularly talented - they only need to get their foot in the door before anyone else.
Back during the ZIRP days you could immediately tell that this is the case in a team by staff rotation alone. Nowadays people understandably cling to their jobs, so you might now know until it's too late.
IME, it’s because they lack the experience to have the Taste one develops as a senior engineer. “This works, and is somewhat understandable” is as far as they get. Little to no understanding of how this solution could fit better in the codebase.
I've managed some incredibly prolific developers and some very slow ones, and the prolific ones are pretty much always the ones more available, more willing to fix things, more willing to take feedback.
And also: they make less mistakes because their skills are sharp. This anecdote comes to mind: https://austinkleon.com/2020/12/10/quantity-leads-to-quality...
If you have to constantly rationalize performance differences by demeaning others, this says more about you than the prolific people.
Others are just trying to get code done, and don't care about quality. These are the types that are upset that their code gets rejected because their goal is advancement and money, and not doing a good job.
FWIW, it's okay to care about both. But if you don't care about doing a good job, you're going to drive everyone around you insane.
Prolific bad coders are a bane on the company, and AI is only going to make them worse.
Code reviews are a productivity tax. No truly effective team would rely on them. The fact that so many software teams view them as indispensable just shows how few effective software teams there are in our industry.
They are akin to a quality check step in manufacturing. Part of what Deming did in revolutionizing manufacturing was eliminating the step in favor of a holistic quality metric owned by all participants and enforced with rigorous statistical process controls. As you say, we in the software industry have all the pieces (autoformatters, tests, benchmarks, etc) to operate this way, but it seems our organizational and management dynamics combat this shift at every turn.
Relevant: When this conversation comes up at work, I like to share Avery Pennarun's post about the review tax: https://apenwarr.ca/log/20260316
1. Your skills are >2 standard deviations above everyone else's.
2. You're fast at producing a lot of half-baked garbage, and your coworkers are too shy to confront you, so they just try to ignore it.
(one of these scenarios is much more likely)
We had better management for a few months, and many on the team were actually quickly closing the skill gap with me, but we had another shuffle and things are stupid once more.
So I'd offer that's option 3. (There's always a third option to any suggested either-or fallacy.)
But of course HN has to with the most uncharitable interpretation.
Right? Right?!
Otherwise you place all burden on high performers to not only push PRs but babysit the rest of the team.
It's not an easy fix, especially with AI letting people cosplay as high performers.
I am not sure there is a good analogue for reviews in the AI world. The human operating the AI should obviously review everything produced but that is clearly not as good as a second pair of human MK1 eye balls from pair programming.
Code review is not for feedback, it’s for ensuring quality (many eyes on the output) and have a shared involvement in the evolution of the code. The time for feedback is earlier, once you have an idea of the solution.
1. Solution under-engineered/over-engineered. 2. Code is hard to read or comprehend. 3. Design/Archtecture lacking. 4. Principles decided upon by team not adhered to.
These are just some of the reasons I've rejected functionally correct code before.
To summarize, in any software engineering course you learn that there are other metrics used to evaluate code other than correctness (maintainability, readability, scalability, portability, efficiency etc.)
While PRs may have been used to correct style, that shouldn't have been their only or even main purpose. That's on whoever was using it that way, not on the concept of reviews.
Then don't review the code. Ask Agents to review and merge it, also shift the responsibilities to the AI agents as well.
If you think human is a bottleneck, then either optimize for humans, or remove humans. What's the problem?
Sadly, in my case, it is the auditor. Our SOC2 documents have this lovely "every change has been reviewed by at least one other human", and it's going to be a fun battle to get that reworded
If a coworker is creating a ton of AI-made PRs, I think the first step should always be to run an AI against them with the "assume this is low quality code and find all problems, big and small" text that was suggested in a comment here, and let that be the first line of defense.
To keep the dev on their toes, each dev should come up with their own prompt for AI PR review and they can switch off who reviews it each time, until there are no problems remaining.
Then a human can start to review it.
It will quickly show the low quality code being produced and the massive waste of time it is for everyone, not to mention all the money spent on tokens for the whole process.
Or it'll work, and everyone will have their way, and only have to review code that's pretty decent.
> first step should always be to run an AI against them
What if they write an agent which takes the feedback and resolves them with a new commit. Which again didn't do anything other than offloading more to humans who are reviewing.
> each dev should come up with their own prompt for AI PR review and they can switch off who reviews it each time, until there are no problems remaining.
This assumes AI reviews are correct most of the time, if so, why do we need even humans. Why not have repository level code reviewer which is run immediately after code has been created?
regardless of where you move it, there is still a bottleneck: humans.
If you don't remove them, you will just pass the ball between agents and at the end of the day human still needs to review it.
Change your auditor and compliance, SOC2 is created for a trust between organizations employing humans, if you think agents can own the things, lead the way, introduce a new compliance, if companies sign up for it, then you will be the first who is removing the human bottleneck.
Prolific humans should scale to the review/test/QA/staging backpressure - not just push to have whatever they produce accepted.
Prolific is not a badge of honor, and "lines of code" is not a quality metric.
I would always feel bad in those cases, because it's clear they spent a lot of time, and I'm going to have to say "no" and they will feel like they wasted a ton of effort.
The thought process around this has started shifting for me in the last few weeks. I'm a lot more comfortable saying "no" with a list of concerns when I suspect the code is AI-generated, and I see others doing the same. CLs that would be sitting around for days because no one wants to be the first to say, "this is bad, don't do this" now get quicker feedback.
The good thing is this feedback doesn't feel like as big a deal as it used to because people are less personally attached to code they generated in 30 minutes vs. code they hand crafted over a week. I had at least 2 LLM-generated PRs that were complete, correct, tested, and pre-reviewed by me, but I got feedback that they were going in the wrong direction. This would have been 8 hours of wasted effort a year ago, but now it's just an extra 30 minutes to rework the direction with LLM assistance.
I get this feeling, too. I do however think the onus is on the developer to make something reviewable by their team members if they want a speedy review. Stacked PRs, scoping things down, properly structuring commits so you can review commit-by-commit for example.
I also think that "I spent a bunch of time on this" is not a valid reason for expecting an approval. It should hurt if you've produced a bunch of code that is way off target, even if it ends up implementing the feature. That's how I learned at least.
A proper way to go about large projects, in my opinion, is the same as with software development at large. Fail fast if possible. Draw up a crude boxes and arrows sketch or just discuss how you want the code to integrate with whatever already exists and invite the team to comment. If no one has anything to say, well then they can't complain later when you implement that approach. But if anyone cares then most likely valueable input will come that makes the end result better.
If they continue to do that, then someone has to tell them they're doing a bad job.
And a some of them never did improve, and got fired for it.
I think slowly opening their eyes to the actual scope of the ticket is a lot easier on them than saying "no".
Like : Here is the ticket, this was the goal. I set out by beginning here- but encountered problems x y z I then refactored to accomplish. Finally..
You just dont drop a blob from orbit.
Ironically, ai could generate that quite well from existing documentation (ticket, tasks and prompts) + https://marketplace.visualstudio.com/items?itemName=vsls-con....
Run several rounds of such reviews until the clanker fails to find problems.
Because the problems AI causes are fundamentally problems of good design. It has the same problems of large teams, but less politics. Do your design well ahead of time, and AI review, or a large team, will amplify what you can do. Potentially by a lot.
Do it badly (or like most companies: do it with bad knowledge of the problem or just don't do it at all) and both team and AI will make a mess of things. If the team is made up of inexperienced programmers, they won't even complain, in fact I've seen teams that like this to be happening. At least in AI reviews I've always seen "grumbling" (in the sense of what you might call mean comments)
I wonder if that's occurred to him.
The fix, imho, is for the reviewers to also use ai to review the code. However, the ultimate responsibility for the outcome(s) should be on the committer - you commit it, you own it, so to speak. If there's an incident, they need to be the one paged in the middle of the night. Bugs resulting from it will land on their desk.
The reviewers aren't a shield/safety net.
Doing something that wastes other people’s time or makes more work for them than necessary makes me feel awful.
I’ve always worked in a way that respects other people’s time and I always tried to make sure I did everything I could to minimize the work I’m asking someone to do for me.
Is it even actually good to get to a point of blaming someone for an incident?
This is false, you’re just oblivious to people who grew up in conditions that would make them that way.
The solution is in the title - he wants human attention, he needs to demonstrate human effort.
This is the complaint:
> he doesn't make it easy for the team to look at.
He has traded readability for volume. The lack of readability is causing him to ship less. This was a bad trade because the readability is the bottleneck not the code creation. He should improve readability.
See this is where I think LLMs can actually improve software engineering. Use them to write better code not more code. The most useful LLM at work so far is the code review bot that occasionally finds things that I missed even with a careful self review and good test coverage.
We should be prompting the LLMs to review our hand written code for security, correctness, style, maintainability, etc., and then use human review for good design and sanity checking. The bots can do things like hold all the C++ correctness rules in their context and apply them sometimes better than even a human expert.
OR - he gets a review for every review he does.
I've been thinking about this in art. Is it the end result that matters, or the process of creating it?
I once saw a hideous sculpture. Didn't like it at all. Then the video zoomed and I saw that the whole thing (quite massive) had been hand-built out of individual toothpicks, and suddenly I thought it was amazing.
Perhaps an even better example: I read a story of a man in india who carved a passage through a mountain, so there would be a shorter route from his remote village to the city. He did it by hand and it took him 20 years. We seem to have an instinctive admiration for heroic effort.
In business, generally only the end result matters. Although, the end result also includes the client's perception of how the product was made... (see also: fake fairtrade etc.) In a meaningful way, the perception, the story, is reality.
If a human put some effort into it, that's a signal.
One of the main reasons that art is valuable is in its ability to communicate emotions. Good art has the ability to serialize emotions within the artist and deserialize them within the mind of the viewer. It's not just "wow, this is a pretty picture", it's "wow, this is how another person sees the world, and now that I understand that, I feel an intimate connection with them".
I think this comment misses the point. Let's forget about AI and assume that there are three developers: A, B, and C. Now, A is supposed to make a PR, but instead they describe it to B, and B writes the code. C reviews the PR and gives feedback. A passes the feedback and the responses between B and C.
As you see, this is not easy for either B or C, and A is totally useless in this scenario. When you replace B with an LLM that doesn't get tired or bored, only C complains about the process.
This was true long before AI. With AI the difference is just a lot bigger. It exposes team inefficiencies quite mercilessly. We have a big glaring issue with the current AI tools not being to suitable for usage by multiple users. All interactions are one on one. Which means hand offs between tools and people are bottle necked on people communicating with each other. So, any issues there with people delaying, gate keeping, etc. become very visible.
The sentiment of pushing back on AI is understandable but probably not a productive reflex. We need to find more effective ways on staying on top of massive amounts of changes. It's not going to slow down and insisting on manually reviewing all code is not going to be a long term sustainable way of developing software. It simply does not scale. I'd question the added value of manual PR reviews at this point. Are they finding real issues? Are we valuing those issues correctly? Could we come up with automated ways to find and fix those same issues? There are a lot of open questions about how we are going to do this. But no question about the notion that we need to up our game on this front.
On top of that, we have been running multi-model AI reviews on every PR through their respective GitHub integrations (Codex, Gemini, Copilot, Greptile, CodeRabbit).
They never fully overlap, and yet they somehow usually all miss the same things. The most significant improvement came from having agents commit their plan along with their work.
On the upside, it means I get to focus my reviews on different things.
Yeah, why not reduce the team size to zero while you are at it?
These generalizations about software engineering have never been useful, IMO. Context is everything, there is no flow chart for building a perfect software process.
Although, I'd say you are absolutely delusional if you think we are universally beyond the point where manual review of pull requests is required.
A bit sick and tired of arguments like yours
What irks me the most with this new trend is when people don't review the code themselves thoroughly enough and you're pointing out obvious flaws that you know that they should be aware of. LLMs can be such a great tool, but it's unfair to make people review your code before you've even seemingly looked at it yourself.
That might teach those people a lesson.
Ultimately what it means to be a professional is that you are responsible for your work. That’s why you get a salary instead of being paid by the token.
This won't help. Your wall of text will just get fed right back into the LLM.
Do you want to put the same effort into your job when nobody else does, or should you reserve your thoughts and just feed it back into the LLM?
The LLMs are being advertised as output increasers but companies so far are using them as excuses to fire people instead of creating previously unbelievable things. It might be better to feed your coworkers output back in and use your thoughts to start the company you thought you never had time for.
If they can't explain their own code then it is by default a bad pull request.
At the end of the day, everyone's time is being wasted on tokens and on the increasing cognitive complexity of AI generated code.
Not necessary. Use Haiku.
The response doesn't need to be good, it just needs to be substantial. Presumably the goal here is basically DoS of the problematic colleague through token limits.
I asked other team members to run my custom instructions to perform a review with copilot before they submit...
Of course no one is doing it. It looks like the PRs I get are still straight from copilot. So I tend to run my review prompt. Cut out the 30% BS issues it "finds" and the rest is good.
why not speak directly to the bot yourself instead? then you can drop pretenses and get to the point
I find this to be a new variant of the old behavior where a colleague comments on a typo in a PR, and the team later moans about laborious back and forth for small nitpicks, instead of simply editing the typo right there (and perhaps leaving a note that they did so)
There's no justice in this world.
The bot isn't making decisions. It's not choosing to submit extensive PRs with bad code. The colleague is the one who needs to actually learn something here, and the problem is that confronting him about it directly is widely considered to be bad form. This is, of course, a deeply unhealthy aspect of our corporate culture. We need to be more open to honest communication, even when it's either uncomplimentary of one of the people involved, or counter to the prevailing opinions within the company.
"I'm writing tons of code, and the process is stumbling where the guy whose job it is to review code isn't reviewing it."
"I'm not reviewing code."
Sometimes I wonder: how does someone go and think so much about their coworkers, and never once think about how they themselves look?
Even if I sympathize with the people complaining about their poorly chosen GitHub-based workflow - whose purpose is to let pull requests languish, for the most part - and how they stumble when overwhelmed with solutions. It's obvious to me, that the people who complain the loudest about the anti-sociality of LLM authored code in their precious harmonious low-effort workplace status quo: they are projecting.
You go to a new restaurant, and order some dishes, and one of the plates your server brings out is a big ol pile of dog shit.
Who's being anti-social in this situation? The restaurant is doing its job and all they're asking is that you do yours. On the other hand, you have certain expectations about what you order from the restaurant and they're not being met. Who's anti-social?
If someone’s using AI to generate a large quantity of actually-tested, actually-good code then that’s one thing. If they’re generating a fire hose of slop and demanding that others do the actual human time-consuming work of validating that code then that person is the problem. It’s hard to tell which is the case here.
One of two things will happen:
1. Things start breaking, proving AI generated code sucks and the individual spamming these PRs is incompetent.
2. The code works fine and reviews are unnecessary for anything other than liability concerns.
That includes taking responsibility and accountability so that the software doesn't become a sad and dangerous mess.
If we want to be an engineering discipline, just yoloing in production is not going to cut it.
The thing that makes it scale is to default to "no" and require the other party to convince you of "yes". Just put the burden of proof where it belongs. If they don't manage, then that's their problem.
Communicating this in a way that is viable for a business scenario certainly comes with its own difficulties, but that is a solvable problem.
In fact, you can use AI to stress test your communication there. Just throw what you want to say at the AI but don't tell it that it is you who wrote it. Then tune the input until it stops saying that you're the problem and starts agreeing with you.
Highly recommend. It's a perfect emotion-driven cargo-culting normie simulator that never calls HR on you.