Hacker News new | past | comments | ask | show | jobs | submit
As someone who pushed ~4x the median PRs on my team before LLMs were a thing, I kind of think the problem here is PRs as a concept. Code review doesn't scale to prolific humans, it definitely can't scale to agents.

And the exact same things you would need to safely give up on PRs for human developers (auto-formatters, linters, comprehensive end-to-end tests, continuous deployment pipelines, etc), are also things that place meaningful guardrails on LLMs, and help them maintain a reasonable quality bar.

> Code review doesn't scale to prolific humans

If that's genuinely your attitude then your org has a problem.

Code review is slow and less fun, for the average sw eng. But for high quality work it's indispensable. So treat code reviews as a scarce resource. Optimize for code reviewer time and attention. Have your PRs the right size? Are they well described? Do you give context? Do they fit in the bigger story? Do you mix in unrelated drive-by fixes? How easy is it to deal with you once you have received comments? Do you address them promptly? Do you give your reviewers credit (if not praise) for their help? Do you give back by doing code reviews yourself with high quality feedback? There are lot of things you can do to streamline things and give code reviews the place in a teams workflow that it deserves.

It's clear they consider code review a personal activity than team activity, in the sense that they think "code review is a gate before my code can be merged" rather than "code review is a process where the team discusses, understands and improves the code".

And that's not rare in teams. Lots of teams and developers do code review wrong.

I even hear other people complain that I "block" their code review. I mean, if there are issues in your code, of course I am going to flag them, what do you think the purpose of code review is?

loading story #48503166
> But for high quality work [a code review is] indispensable.

The argument here is that all code reviews are done with attention and care, but quality of a code review is highly dependent on the reviewer and the team’s review process, and in the real world the quality of reviews pretty much follow the same distribution curve as, say, agile project management: For the time invested in reviewing, a handful of teams get excellent utility from them, most teams get little benefit, and a sad few actually cause harm.

If most code reviews provide only a little benefit at base for most teams, recommending that most teams should also delay shipping quality work is going to sound a lot like bad advice.

> Have your PRs the right size?

I’ve noticed that large PRs aren’t just a problem for human reviewers: they’re a problem for AI reviewers too.

If I submit a 100 line PR I’m likely to get some useful comments back from both humans and LLMs. In fact the LLM is likely to come back with so much feedback it gets down to the nitpicky/annoying level.

If I submit 1000+ lines in my PR, the humans either don’t have time and/or get scrolling blindness, and the AI reviewer is likely to give me a response that amounts to, “<<slaps roof>> Looks good to me bro: ship it!”

I guess they have a limited token budget for reviews so you can bamboozle them simply by blowing most or all of that budget.

loading story #48503227
I agree about how you can reciprocate for a good code review, but I'd just add that for me, code review is also fun — when done for a fellow human who I might be teaching.

It is definitely very grunt-like for an LLM.

Most orgs have a problem with quality unless it is enforced by government requirements for certifications and such.

Code reviews, documentation, static analysis, only retrieving deps from internal repos, unit tests, integration tests, ....

Especially in domains where shipping software is not the main product, and a plain cost center to the main business of physical goods.

Gently, as long as you work with humans, you should consider yourself working _for_ those humans. Everyone needs shared state to work from, and that's just the cost of doing business.

That said, sometimes low-trust environments are the issue, not PRs. In a higher trust environment, PR review is a helpful thing you usually desire, not dread.

> In a higher trust environment, PR review is a helpful thing you usually desire, not dread

Respectfully, in a high-trust environment, feedback should be delivered well before the PR stage. If you've let someone write a whole bunch of code without having a shared understanding of how the solution should work, you may have earlier process issues that PRs are papering over

You cannot deliver feedback on something that doesn't exist. If you mean a review in the style of "all of this is wrong and needs to be rewritten differently" then yes, that's something to be discussed beforehand. But I don't imagine this is what people think of when discussing a review.
Depends on how PRs function within teams. For some, the PR is a lightweight thing that is the preferred method of communication. It sounds like you are imagining a case where face to face communication, or communication over chat, is preferred for early stages, with the PR being a nearly final artifact. But it doesn't have to work like that.
loading story #48503276
A discussion ahead of the implementation can also bias the two parties to that discussion and have them overlook the same implementation issue: many things you only understand once you start implementing.

If you have these parties review each other's code, I agree that rarely brings much value.

I think the best way to understand our experience with reviews is to stop and say: in a few sentences, what do you expect out of a quality code review? (sounds like nothing in your case, but I am curious)

Agreed. But those things are not mutually exclusive.
Agree. All the subtleties of how a high trust environment work are hard to enumerate
Depends on the change. Certainly most PRs don't need feedback before the PR is ready - the task is too obvious, and there's little to feed back on before there's any code.

For bigger changes, of course you need feedback on designs. But that could easily be in the form of draft PRs.

I definitely would push back on anything that required feedback before PRs. That's way too much process. Just going to slow you down for no benefit.

> Code review doesn't scale to prolific humans

I've worked with people who consider themselves 'prolific humans'. Someone always has to tidy upp later, and its never them

> I've worked with people who consider themselves 'prolific humans'. Someone always has to tidy upp later, and its never them

I run both infrastructure and security - that means a lot of relatively self-contained PRs to infrastructure-as-code and dependency management systems. I'm also the team lead, which makes me responsible for a lot of throwaway prototyping, as well as cleaning up anyone else's mess...

Yes, the prolific-but-damaging engineers are all too common in corporate. But particularly in startup land, you tend to find your high-performers wearing a lot of hats at once.

My experience is that it's even worse: they've already produced enough code that the codebase matches their taste and theirs alone.

So in essence you have one guy working at 4x and e.g. four other getting just 0.7x - net effect is still positive, but everyone save for that one person is miserable.

Mind you, the 4x dev doesn't necessarily have to be particularly talented - they only need to get their foot in the door before anyone else.

Back during the ZIRP days you could immediately tell that this is the case in a team by staff rotation alone. Nowadays people understandably cling to their jobs, so you might now know until it's too late.

> … and its never them

IME, it’s because they lack the experience to have the Taste one develops as a senior engineer. “This works, and is somewhat understandable” is as far as they get. Little to no understanding of how this solution could fit better in the codebase.

There's also those that burn themselves out, and John Carmack!
That's such bullshit.

I've managed some incredibly prolific developers and some very slow ones, and the prolific ones are pretty much always the ones more available, more willing to fix things, more willing to take feedback.

And also: they make less mistakes because their skills are sharp. This anecdote comes to mind: https://austinkleon.com/2020/12/10/quantity-leads-to-quality...

If you have to constantly rationalize performance differences by demeaning others, this says more about you than the prolific people.

loading story #48503119
loading story #48506843
Well, it's either:

1. Your skills are >2 standard deviations above everyone else's.

2. You're fast at producing a lot of half-baked garbage, and your coworkers are too shy to confront you, so they just try to ignore it.

(one of these scenarios is much more likely)

As someone who often submits significantly more PRs (without using AI) than teammates, it's not exactly a skill delta. Yes that helps but it's often only a piece of the puzzle. The other ingredients include motivations and culture. In such cases, something else is the driving force, such as posturing for promotion, stability, etc. My current team is massively low performing. Management pays some lip service to all the problems, but also runs things in a way that discourages high performance. It's not a good fit for me, as I want to tackle challenges head on, improve the environment, be productive, embrace change. I'm also very comfortable with the code base as well as the code review process, but I'm surrounded by "seniors" who do not know how to code review, and who are happy to drag their feet and spin their wheels for months before pushing out small PRs that hurt my brain. How can that little work be shown after months, barely functional at best?

We had better management for a few months, and many on the team were actually quickly closing the skill gap with me, but we had another shuffle and things are stupid once more.

So I'd offer that's option 3. (There's always a third option to any suggested either-or fallacy.)

It could also potentially be that GP is making atomic PRs, while everyone else is just making 5000-line PRs with multiple responsibilities that just gets merged with "LGTM".

But of course HN has to with the most uncharitable interpretation.

Are PRs honestly helping with either case? Either you severely rate-limit your high-performers, or you drown everyone else in review, and both outcomes are bad for the overall team
The latter has an easy fix: the perpetrator is not allowed to take new work while there are pending review comments left unaddressed.
loading story #48503101
As a prolific PR author, I've found how I communicate has a major factor on how well and quickly people respond to PRs. I've recorded my lessons at https://epage.github.io/dev/pr-style/.
I have always considered Kent Beck understood this the best, the scaling for code reviews as you go to reduced release timeframes is to pair program, that brings the number of people reviewing it down but also increases the understanding for the reviewer. Comprehensive end to end tests are more a replacement for manual quality assurance for regressions.

I am not sure there is a good analogue for reviews in the AI world. The human operating the AI should obviously review everything produced but that is clearly not as good as a second pair of human MK1 eye balls from pair programming.

No need to pair program, you can always send a message to your colleague about the design of the upcoming code, especially if it’s going to impact them or if it’s an area that they’re more familiar with. Waiting till a PR for feedback is wrong IMO.

Code review is not for feedback, it’s for ensuring quality (many eyes on the output) and have a shared involvement in the evolution of the code. The time for feedback is earlier, once you have an idea of the solution.

Comprehensive end-to-end tests and CI can only attest to correctness, most engineers worth their salt won't review code only in regards to that aspect though.
In the bad old days before auto-formatters and linters, PRs were heavily used to enforce style guidelines. If we can enforce both style and correctness in our CI pipeline, what is actually left?
The functionally correct code could be rejected in PR for many reasons other than style:

1. Solution under-engineered/over-engineered. 2. Code is hard to read or comprehend. 3. Design/Archtecture lacking. 4. Principles decided upon by team not adhered to.

These are just some of the reasons I've rejected functionally correct code before.

To summarize, in any software engineering course you learn that there are other metrics used to evaluate code other than correctness (maintainability, readability, scalability, portability, efficiency etc.)

As said already: readability and maintainability of the code (closely related) are two most important values a code review can get you.
If the correctness check was vibecoded there's a good chance it was cheated. So maybe that, on top of the, you know, code review (see the sibling comment).

While PRs may have been used to correct style, that shouldn't have been their only or even main purpose. That's on whoever was using it that way, not on the concept of reviews.

Code architecture and technical design. You can have a solution that works fine, but are too complex or will impede future changes. Maybe you have code that has already been solved or your variables’ name are too generic. Maybe your modules are messy and your data structures are not modeled well.
> Code review doesn't scale to prolific humans, it definitely can't scale to agents.

Then don't review the code. Ask Agents to review and merge it, also shift the responsibilities to the AI agents as well.

If you think human is a bottleneck, then either optimize for humans, or remove humans. What's the problem?

> If you think human is a bottleneck, then either optimize for humans, or remove humans. What's the problem?

Sadly, in my case, it is the auditor. Our SOC2 documents have this lovely "every change has been reviewed by at least one other human", and it's going to be a fun battle to get that reworded

I think the "and merge it" is the problem in the above comment.

If a coworker is creating a ton of AI-made PRs, I think the first step should always be to run an AI against them with the "assume this is low quality code and find all problems, big and small" text that was suggested in a comment here, and let that be the first line of defense.

To keep the dev on their toes, each dev should come up with their own prompt for AI PR review and they can switch off who reviews it each time, until there are no problems remaining.

Then a human can start to review it.

It will quickly show the low quality code being produced and the massive waste of time it is for everyone, not to mention all the money spent on tokens for the whole process.

Or it'll work, and everyone will have their way, and only have to review code that's pretty decent.

loading story #48503937
> Sadly, in my case, it is the auditor.

Change your auditor and compliance, SOC2 is created for a trust between organizations employing humans, if you think agents can own the things, lead the way, introduce a new compliance, if companies sign up for it, then you will be the first who is removing the human bottleneck.

>As someone who pushed ~4x the median PRs on my team before LLMs were a thing, I kind of think the problem here is PRs as a concept. Code review doesn't scale to prolific humans

Prolific humans should scale to the review/test/QA/staging backpressure - not just push to have whatever they produce accepted.

Prolific is not a badge of honor, and "lines of code" is not a quality metric.

Either you were a head above the rest of the team and had the intellect to produce high quality value adding work, or then you were the "move fast break things" type of guy producing a lot of extra liability and work for others.