Ha, I thought this would be a really useful resource but I think the people the author is complaining about do much better than most benchmarking I see in the industry
Almost all the benchmarking results I see is just a percentage difference between two algebraic means, no statistical analysis whatsoever.
Very common interaction: QA folks say "your change degraded some of our metrics and improved some others". I know they are full of shit because it's impossible that my change improved any perf metrics. I ask for statistical details, they don't have any, this meeting was a waste of time, it will be next time too.
The fact that I get these reactions suggests that everyone else just lets each other get away with it.
Yep. The most recent example that's stuck in my head is actually much worse: they didn't even take the mean! One sample!
https://github.com/denoland/pm-benchmark
Check the run bench shell script (there's not much else in the repo anyways)
loading story #43133887