Hacker News new | past | comments | ask | show | jobs | submit
To me the most interesting part of this is the claim that you can accurately and meaningfully measure software engineering productivity.
You can - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev.

For teams you can measure meaningful outcomes and improve team metrics.

You shouldn’t really compare teams but it also is possible if you know what teams are doing.

If you are some disconnected manager that thinks he can make decisions or improvements reducing things to single numbers - yeah that’s not possible.

> For teams you can measure meaningful outcomes and improve team metrics.

How? Which metrics?

My company uses the Dora metrics to measure the productivity of teams and those metrics are incredibly good.
loading story #42008048
There's only one metric that matters at the end of the day, and that's $. Revenue.

Unfortunately there's a lot of lag

loading story #42003342
loading story #42003763
That is what we pay managers -to figure out- for. They should find out which and how by knowing the team, familiarity with domain knowledge, understanding company dynamics, understanding customer, understanding market dynamics.
loading story #41999773
loading story #42002770
loading story #42001767
loading story #42008202
At scale you can do this in a bunch of interesting ways. For example, you could measure "amount of time between opening a crash log and writing the first character of a new change" across 10,000s of engineers. Yes, each individual data point is highly messy. Alice might start coding as a means of investigation. Bob might like to think about the crash over dinner. Carol might get a really hard bug while David gets a really easy one. But at scale you can see how changes in the tools change this metric.

None of this works to evaluate individuals or even teams. But it can be effective at evaluating tools.

There's lots of stuff you can measure. It's not clear whether any of it is correlated with productivity.

To use your example, a user with an LLM might say "LLM please fix this" as a first line of action, drastically improving this metric, even if it ruins your overall productivity.

You can come up with measures for it and then watch them, that’s for sure.
when metric becomes the target it ceases to be a good metric. when discovered how it works developers will type the first character immediately after opening the log.

edit: typo

Only if the developer is being judged on the thing. If the tool is being judged on the thing, it's much less relevant.

That is, I, personally, am not measured on how much AI generated code I create, and while the number is non-zero, I can't tell you what it is because I don't care and don't have any incentive to care. And I'm someone who is personally fairly bearish on the value of LLM-based codegen/autocomplete.

That was my point, veiled in an attempt to be cute.