Story Detail of id 48490625 | Liveview Hacker News

epolanski6 hours ago | on: Workers are spending over 6 hours a week botsitting AI, fueling job frustration

It doesn't change the premise.

AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.

Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.

Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.

4.7 first, 4.8 later and fable are absolute disasters as assistants.

Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.

I have never disliked our job more.

kstenerud6 hours ago | parent | next

Wow... Our experiences have been very different, then. I've found each upgrade of Opus to be a noticeable improvement in its complex reasoning and delegation capabilities over its predecessor.

To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).

The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.

And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.

loading story #48491226

loading story #48491619

taeric6 hours ago | parent | next

I think this is just a misunderstanding of how most technology has always worked?

Consider what is happening in most construction sites. The heavy work is absolutely from the technology on site. But without people there to oversee it and keep it working, it would fail.

And that is almost certainly true at any industrial site. Indeed, look up videos of high tech looms. A large portion of the technology added to them are so that the operators can locate the fault and fix it.

senordevnyc5 hours ago | parent | next

AI should be assisting us, instead it's doing the job and it's us being an assistant to it.

If you're a manager and you ask a report to do something and they come back with a question, does that mean you're now their assistant?

I give agents the tasks, I answer their questions, I make choices about the tradeoffs in their plan, I supervise their implementation, I review their output, I have them walk me through things. In what way is this not delegating to them and managing their work, just like a more junior employee?

rmunn6 hours ago | parent | next

The problem (okay, one of the problems) with renting other people's models is, as you mentioned, that they can and will change out the model without notifying you ahead of time, and you don't always get to control which model you use. (They might decide to retire it, and you won't be able to get it back if they do).

Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.

But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.

And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."

Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.

And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.

Applejinx6 hours ago | parent | next

That sounds weirdly gendered even though there's no reason it should be.

Are you getting LLMsplained? :)

AnimalMuppet6 hours ago | parent

Well... as a human software engineer, I've been the one with very strong, intelligent, completely wrong takes. The question is, are the LLMs improving faster than you can improve a junior dev? And is their ceiling as high?

#visit	13,750,738
#session	74,665
#live-session	0