Story Detail of id 43054426 | Liveview Hacker News

_dark_matter_1 week ago | on: We were wrong about GPUs

This is a false dichotomy. The truth is we are constantly moving further and further away from the silicon. New developers don't have as much need to understand these details because things just work; some do care because they work at a job where it's required, or because they're inherently interested (a small number).

Over time we will move further away. If the cost of an easily managed solution is low enough, why do the details matter?

serviceberry1 week ago | parent | next

> The truth is we are constantly moving further and further away from the silicon.

Are we? We're constantly changing abstractions, but we don't keep adding them all that often. Operating systems and high-level programming languages emerged in the 1960s. Since then, the only fundamentally new layer of abstraction were virtual machines (JVM, browser JS, hardware virtualization, etc). There's still plenty of hardware-specific APIs, you still debug assembly when something crashes, you still optimize databases for specific storage technologies and multimedia transcoders for specific CPU architectures...

hibikir1 week ago | root | parent | next

Maybe fundamentally is an extremely load bearing word here, but just in the hardware itself we see far more abstraction than we saw in the 60s. The difference between what we called microcode in an 8086 and what is running in any processor you buy in 2025 is an abyss. It almost seems like hardware emulation. I could argue that the layers of memory caching that modern hardware have are themselves another layer vs the days when we sent instructions to change which memory banks to read. The fact that some addresses are very cheap and others are not, and the complexity is handled in hardware is very different than stashing data in extra registers we didn't need this loop. The virtualization any OS does for us is much deeper than even a big mainframe that was really running a dozen things at once. It only doesn't look like additional layers if you look from a mile away.

The majority of software today is written without knowing even which architecture the processor is going to be, how much of the processor we are going to have, whether anything will ever fit in memory... hell, we can write code that doesn't know not just the virtual machine it's going to run in, but even the family of virtual machine. I have written code that had no idea if it was running in a JVM, LLVM or a browser!

So when I compare my code from the 80s to what I wrote this morning, the distance from the hardware doesn't seem even remotely similar. I bet someone is writing hardware specific bits somewhere, and that maybe someone's debugging assembly might actually resemble what the hardware runs, maybe. But the vast majority of code is completely detached from anything.

y1n01 week ago | root | parent | next

At the company I work for, I routinely mock the software devs for solving every problem by adding yet another layer of abstraction. The piles of abstractions these people levy is mind numbingly absurd. Half the things they are fixing, if not more, are created by the abstractions in the first place.

Rury1 week ago | root | parent

Yeah, I remember watching a video of (I think?) a European professor who helped with an issue devs were having in developing The Witness. Turns out they had a large algorithm they developed in high level code (~2000 lines of code? can't remember) to place flora in the game world, which took minutes to process, and it was hampering productivity. He looked at it all, and redid almost all of it in something like <20 lines of assembly code, and it achieved the same result in microseconds. Unfortunately, I can't seem to find that video anymore...

Frankly though, when I bring stuff like this up, it feels like I'm being mocked than the other way around - like we're the minority. And sadly, I'm not sure if anything can ultimately be done about it. People just don't know what they don't know. Some things you can't tell people despite trying to, they just won't get it.

38362936481 week ago | root | parent | next

It's Casey Muratori, he's an American gamedev, not a professor. The video was a recorded guest lecture for a uni in the Netherlands though.

And it wasn't redone in assembly, it was C++ with SIMD intrinsics, which might as well just be assembly.

https://www.youtube.com/watch?v=Ge3aKEmZcqY&list=PLEMXAbCVnm...

sethammons1 week ago | root | parent | next

See Plato's Cave. As an experienced dev, I have seen sunlight and the outside world and so many devs think shadows puppets in a cave is life.

https://en.m.wikipedia.org/wiki/Allegory_of_the_cave

godelski1 week ago | root | parent | next

That really sounds like no one bothered profiling the code. Which I'd say is underengineered, not over.

raziel2p1 week ago | root | parent

what's your point exactly? what do you hope to achieve by "bringing it up" (I assume in your workplace)?

most programmers are not able to solve a problem like that in 20 lines of assembly or whatever, and no amount of education or awareness is going to change that. acting as if they can is just going to come across as arrogant.

Rury1 week ago | root | parent | next

The point is exactly as the above post mentioned:

> Half the things they are fixing, if not more, are created by the abstractions in the first place

Unlike the above post though, in my experience, it's less often devs (at least the best ones) who want to keep moving away from the silicon, but more often management. Everywhere I have worked, management wants to avoid control over the lower-level workings of things and outsource or abstract it away. They then proceed to wonder why we struggle with the issues that we have, despite people who deal with these things trying to explain it to them. They seem to automatically assume that higher level abstractions are inherently better, and will lead to productivity gains, simply because you don't have to deal with the underlying workings of things. But the example I gave, is reason for why that that isn't always necessarily the case. Fact is, sometimes problems are better and more easily solved in a lower-level abstraction.

But as I had said, in my experience, management often wants to go the opposite way and often disallows us control over these things. So, as an engineer who wants to solve the problems as much as management or customers want their problems solved, hope to achieve by "bringing it up" in cases which seem appropriate, a change which empowers us to actually solve such problems.

Don't get me wrong though, I'm not saying lower-level is always the way to go. It always depends on the circumstances.

whilenot-dev1 week ago | root | parent

> acting as if they can is just going to come across as arrogant.

Hold on there a sec: WHAT?!

Engineers tend to solve their problems differently and the circumstances for those differences are not always clear. I'm in this field because I want to learn as many different approaches as possible. Did you never experience a moment when you could share a simpler solution to a problem with someone and could observe first hand when they became one of todays lucky 10'000[0]? That's anything but arrogant in my book.

Sadly, what I can increasingly observe is the complete opposite. Nobody wants to talk about their solutions, everyone wants to gatekeep and become indispensable, and criticism isn't seen as part of productive environments as "we just need to ship that damn feature!". Team members should be aware when decisions have been made out of lazyness, in good faith, out of experience, under pressure etc.

[0]: https://xkcd.com/1053/

fphhotchips1 week ago | root | parent | next

> There's still plenty of hardware-specific APIs, you still debug assembly when something crashes, you still optimize databases for specific storage technologies and multimedia transcoders for specific CPU architectures...

You might, maybe, but an increasing proportion of developers:

- Don't have access to the assembly to debug it

- Don't even know what storage tech their database is sitting on

- Don't know or even control what CPU architecture their code is running on.

My job is debugging and performance profiling other people's code, but the vast majority of that is looking at query plans. If I'm really stumped, I'll look at the C++, but I've not yet once looked at assembly for it.

daveguy1 week ago | root | parent

This makes sense to me. When I optimize, the most significant gains I find are algorithmic. Whether it's an extra call, a data structure that needs to be tweaked, or just utilizing a library that operates closer to silicon. I rarely need to go to assembly or even a lower level language to get acceptable performance. The only exception is occasionally getting into architecture specifics of a GPU. At this point, optimizing compilers are excellent and probably have more architecture details baked into them than I will ever know. Thank you, compiler programmers!

almostgotcaught1 week ago | root | parent | next

> At this point, optimizing compilers are excellent

the only people that say this are people who don't work on compilers. ask anyone that actually does and they'll tell you most compiler are pretty mediocre (tend to miss a lot of optimization opportunities), some compilers are horrendous, and a few are good in a small domain (matmul).

mandevil1 week ago | root | parent

It's more that the God of Moore's Law have given us so many transistors that we are essentially always I/O blocked, so it effectively doesn't matter how good our assembly is for all but the most specialized of applications. Good assembly, bad assembly, whatever, the point is that your thread is almost always going to be blocked waiting for I/O (disk, network, human input) rather than something that a fancy optimization of the loop that enables better branch prediction can fix.

almostgotcaught1 week ago | root | parent

> It's more that the God of Moore's Law have given us so many transistors that we are essentially always I/O blocked

this is again just more brash confidence without experience. you're wrong. this is a post about GPUs and so i'll tell you that as a GPU compiler engineer i spend my entire day (work day) staring/thinking about asm in order to affect register pressure and ilp and load/store efficiency etc.

> rather than something that a fancy optimization of the loop

a fancy loop optimization (pipelinig) can fix some problems (load/store efficiency) but create other problems (register pressure). the fundamental fact is NFL theorem applies here fully: you cannot optimize for all programs uniformly.

https://en.wikipedia.org/wiki/No_free_lunch_theorem

godelski1 week ago | root | parent

I just want to second this. Some of my close friends are PL people working on compilers. I was in HPC before coming to ML, having written a fair amount of CUDA kerenls, a lot of parallelism, and dealing with I/O.

While yes, I/O is often a computational bound, I'd be shy to really say that in a consumer space when we aren't installing flash buffers, performing in situ processing, or even pre-fetching. Hell, in many programs I barely even see any caching! TBH, most stuff can greatly benefit from asynchronous and/or parallel operations. Yeah, I/O is an issue, but I really would not call anything I/O bound until you've actually gotten into parallelism and optimizing code. And even not until you apply this to your I/O operations! There is just so much optimization that a compiler can never do, and so much optimization that a compiler won't do unless you're giving it tons of hints (all that "inline", "const", and stuff you see in C. Not to mention the hell that is template metaprogramming). Things you could never get out of a non-typed language like python, no matter how much of the backend is written in C.

That said, GPU programming is fucking hard. Godspeed you madman, and thank you for your service.

davemp1 week ago | root | parent | next

> At this point, optimizing compilers are excellent and probably have more architecture details baked into them than I will ever know.

While modern compilers are great, you’d be surprised about the seemingly obvious optimizations compilers can’t do because of language semantics or the code transformations would be infeasible to detect.

I type versions of functions into godbolt all the time and it’s very interesting to see what code is/isn’t equivalent after O3 passes

fpoling1 week ago | root | parent

The need to expose SSE instruction to system languages tells that compilers are not good at translating straightforward code into optimal machine code. And using SSE properly allows often to speed up the code by several times.

JambalayaJimbo1 week ago | root | parent | next

I don’t understand how you could say something like HTTP or Cloud Functions or React aren’t abstractions that software developers take for granted.

fpoling1 week ago | root | parent | next

These days even if one writes in machine code it will be quite far away from the real silicon as that code has little to do with what CPU is actually doing. I suspect that C source code from, say, nineties was closer to the truth than the modern machine code.

titmouse1 week ago | root | parent

Could you elaborate? I may very well just be ignorant on the topic.

I understand that if you write machine code and run it in your operating system, your operating system actually handles its execution (at least, I _think_ I understand that), but in what way does it have little to do with what the CPU is doing?

For instance, couldn't you still run that same code on bare metal?

Again, sorry if I'm misunderstanding something fundamental here, I'm still learning lol

wolfgang426 days ago | root | parent

The quest for performance has turned modern CPUs into something that looks a lot more like a JITed bytecode interpreter rather than the straightforward “this opcode activates this bit of the hardware” model they once were. Things like branch prediction, speculative execution, out-of-order execution, hyperthreading, register renaming, L1/2/3 caches, µOp caches, TLB caches... all mean that even looking at the raw machine code tells you relatively little about what parts of the hardware the code will activate or how it will perform in practice.

titmouse5 days ago | root | parent

That makes sense, thank you!

vladms1 week ago | root | parent | next

The abstraction manifest more on the language level. No memory management, simplified synchronization primitives, no need of compilation.

Not sure virtual machine are fundamentally different. In the end if you have 3 virtual or 3 physical machine the most important difference is how fast you can change their configuration. They will still have all the other concepts (network, storage, etc.). The automation that comes with VM-s is better than it was for physical (probably), but then automation for everything got better (not only for machines).

ChoHag1 week ago | root | parent

[dead]

Uvix1 week ago | parent | next

The details matter because someone has to understand the details, and it's quicker and more cost-effective if it's the developer.

At my job, a decade ago our developers understood how things worked, what was running on each server, where to look if there were problems, etc. Now the developers just put magic incantations given to them by the "DevOps team" into their config files. Most of them don't understand where the code is running, or even what much of it is doing. They're unable or unwilling to investigate problems on their own, even if they were the cause of the issue. Even getting them to find the error message in the logs can be like pulling teeth. They rely on this support team to do the investigation for them, but continually swiveling back-and-forth is never going to be as efficient as when the developer could do it all themselves. Not to mention it requires maintaining said support team, all those additional salaries, etc.

(I'm part of said support team, but I really wish we didn't exist. We started to take over Ops responsibilities from a different team, but we ended up taking on Dev ones too and we never should've done that.)

mandevil1 week ago | root | parent | next

http://www.antipope.org/charlie/blog-static/2014/10/not-a-ma...

This blog has a brilliant insight that I still remember more than a decade later: we live in a fantasy setting, not a Sci-fi one. Our modern computers are so unfathomable complex that they are demons, ancient magic that can be tamed and barely manipulated, but not engineered. Modern computing isn't Star Trek TNG, where Captain Picard and Geordi LaForge each have every layer of their starship in their heads with full understanding, and they can manipulate each layer independently. We live in a world where the simple cell phone in our pocket contains so much complexity that it is beyond any 10 human minds combined to fully understand how the hardware, the device drivers, the OS, the app layer, and the internet all interact between each other.

david-gpu5 days ago | root | parent

> We live in a world where the simple cell phone in our pocket contains so much complexity that it is beyond any 10 human minds combined to fully understand how the hardware, the device drivers, the OS, the app layer, and the internet all interact between each other.

Try tens of thousands of people. A mobile phone is immensely more complicated than people realize.

Thank you for writing it so eloquently. I will steal it.

busterarm1 week ago | root | parent

> (I'm part of said support team, but I really wish we didn't exist. We started to take over Ops responsibilities from a different team, but we ended up taking on Dev ones too and we never should've done that.)

There will always be work for people like us. It's not so bad. We're not totally immune to layoffs but for us they come several rounds in.

pdntspa1 week ago | parent | next

> why do the details matter?

This statement encapsulates nearly everything that I think is wrong with software development today. Captured by MBA types trying to make a workforce that is as cheap and replaceable as possible. Details are simply friction in a machine that is obsessed with efficiency to the point of self-immolation. And yet that is the direction we are moving in.

Details matter, process matters, experience and veterancy matters. Now more than ever.

lifeisstillgood1 week ago | parent | next

I used to think this, ut it only works if the abstractions hold - it’s like if we stopped random access memory and went back to tape drives suddenly abstractions matter.

My comment elsewhere goes into but more detail but basically silicon stopped being able to make single threaded code faster in about 2012 - we just have been getting “more parallel cores” since. And now at wafer scale we see 900,000 cores on a “chip”. When 100% parallel coding runs 1 million times faster than your competitors, when following one software engineering path leads to code that can run 1M X, then we will find ways to use that excess capacity - and the engineers who can do it get to win.

I’m not sure how LLMs face this problem.

dbcjv7vhxj1 week ago | root | parent | next

This.

As soon as the abstractions leak or you run into an underlying issue you suddenly need to understand everything about the underlying system or you're SOOL.

I'd rather have a simpler system I already understand all the proceeding abstractions about.

The overhead of this is minimal when you keep things simple and avoid shiny things.

jonas211 week ago | root | parent

That's why abstractions like PyTorch exist. You can write a few lines of Python and get good utilization of all those GPU cores.

varelse1 week ago | root | parent

[dead]

zelon881 week ago | parent | next

Tell that to the people who keep the network gear running at your office. You might not see the importance of knowing the details, but those details still matter and are still in plain use all around you every day. The aversion to learning the stack you're building with is frustrating to the people who keep that stack running.

I think that if the development side knew a little bit of the rest of the stack they'd write better applications overall.

loading story #43060797

cathalc1 week ago | parent | next

"Preventing the Collapse of Civilization" by Jonathan Blow comes to mind - https://www.youtube.com/watch?v=ZSRHeXYDLko

A fantastic talk.

Spooky231 week ago | parent | next

I think that’s the same answer someone would say about an IBM mainframe in 1990. And just as wrong.

I’ll use my stupid hobby home server stuff as an example. I tossed the old VMware box years ago. You know what I use now? Little HP t6x0 thin clients. They are crappy little x86 SoCs with m2 slots, up to 32GB memory and they can be purchased used for $40. They aren’t fast, but perform better than the cheaper AWS and GCP instances.

In that a trivial use case? Absolutely. Now move from $30 to about $2000. Buy a Mac Mini. It’s a powerful arm soc with ridiculously fast storage and performance. Probably more compute than a small/mid size company computer room a few years ago and more performant than a $1M SAN a decade ago.

6G will bring 10gig cellular.

Hyperscalers datacenters are the mainframe of 2025.

loading story #43055174

loading story #43055572

sgarland1 week ago | parent | next

> why do the details matter?

Have you ever had a plumber, HVAC tech, electrician, etc. come out to your house for something, and had them explain it to you? Have you had the unfortunate experience of that happening more than once (with separate people)? If so, you should know why this matters: because if you don’t understand the fundamentals, you can’t possibly understand the entire system.

It’s the same reason why the U.S. Navy Nuclear program still teaches Electronics Technicians incredibly low-level things like bus arbitration on a 386 (before that, it was the 68000). Not because they expect most to need to use that information (though if necessary, they carry everything down to logic analyzers), but because if you don’t understand the fundamentals, you cannot understand the abstractions. Actually, the CPU is an abstraction, I misspoke: they start by learning electron flow, then moving into PN junctions, then transistors, then digital logic, and then and only then do they finally learn how all of those can be put together to accomplish work.

Incidentally, former Navy Nukes were on the initial Google SRE team. If you read the book [0], especially Chapter 12, you’ll get an inkling about why this depth of knowledge matters.

Do most people need to understand how their NIC turns data into electrical signals? No, of course not. But occasionally, some weird bug emerges where that knowledge very much matters. At some point, most people will encounter a bug that they are incapable of reasoning about, because they do not possess the requisite knowledge to do so. When that happens, it should be a humbling experience, and ideally, you endeavor to learn more about the thing you are stuck on.

[0]: https://sre.google/sre-book/table-of-contents/

layoric1 week ago | parent | next

Capture and product stickiness. If your product is all serverless wired together with an event system by the same cloud provider, you are in a very weak position to argue that you will go elsewhere where, leveraging the competitive market to your advantage.

The more the big cloud providers can abstract cpu cycles, memory, networking, storage etc, the more they don’t have to compete with others doing the same.

swatcoder1 week ago | parent | next

> because things just work

If that were true, you might be right.

What happens in reality is that things are promised to work and (at best) fulfill that promise so long as no developers or deployers or underlying systems or users deviate from a narrow golden path, but fail in befuddling ways when any of those constraints introduce a deviation.

And so what we see, year over year, is continued enshittening, with everything continuously pushing the boundaries of unreliability and inefficiency, and fewer and fewer people qualified to actually dig into the details to understand how these systems work, how to diagnose their issues, how to repair them, or how to explain their costs.

> If the cost of an easily managed solution is low enough, why do the details matter?

Because the patience that users have for degraded quality, and the luxury that budgets have for inefficiency, will eventually be exhausted and we'll have collectively led ourselves into a dark forest nobody has the tools or knowledge to navigate out of anymore.

Leveraging abstractions and assembling things from components are good things that enable rapid exploration and growth, but they come with latent costs that eventually need to be revisited. If enough attention isn't paid too understanding, maintaining, refining, and innovating on the lowest levels, the contraptions built through high-level abstraction and assempbly will eventually either collapse upon themselves or be flanked by competitors who struck a better balance and built on more refined and informed foundations.

As a software engineer who wants a long and satisfying career, you should be seeking to understand your systems to as much depth as you can, making informed, contextual choices about what abstractions you leverage, exactly what they abstract over, and what vulnerabilities and limitations are absorbed into your projects by using them. Just making naive use of the things you found a tutorial for, or that are trending, or that make things look easy today, is a poison to your career.

llm_trw1 week ago | parent

> If the cost of an easily managed solution is low enough

Because vertical scaling is now large enough that I can run all of twitter/amazon on one single large server. And if I'm wrong now, in a decade I won't be.

Compute power grows exponentially, but business requirements do not.

#visit	12106406
#session	46836
#live-session	0