Hacker News new | past | comments | ask | show | jobs | submit
I'm confused and a bit disturbed; honestly having a very difficult time internalizing and processing this information. This announcement is making me wonder if I'm poorly calibrated on the current progress of AI development and the potential path forward. Is the key idea here that current AI development has figured out enough to brute force a path towards AGI? Or I guess the alternative is that they expect to figure it out in the next 4 years...

I don't know how to make sense of this level of investment. I feel that I lack the proper conceptual framework to make sense of the purchasing power of half a trillion USD in this context.

"There are maybe a few hundred people in the world who viscerally understand what's coming. Most are at DeepMind / OpenAI / Anthropic / X but some are on the outside. You have to be able to forecast the aggregate effect of rapid algorithmic improvement, aggressive investment in building RL environments for iterative self-improvement, and many tens of billions already committed to building data centers. Either we're all wrong, or everything is about to change." - Vedant Misra, Deepmind Researcher.

Maybe your calibration isn't poor. Maybe they really are all wrong but there's a tendency here to these these people behind the scenes are all charlatans, fueling hype without equal substance hoping to make a quick buck before it all comes crashing down, but i don't think that's true at all. I think these people really genuinely believe they're going to get there. And if you genuinely think that, them this kind of investment isn't so crazy.

The problem is, they are hugely incentivised to hype to raise funding. It’s not whether they are “wrong”, it’s whether they are being realistic.

The argument presented in the quote there is: “everyone in AI foundation companies are putting money into AI, therefore we must be near AGI.”

The best evaluation of progress is to use the tools we have. It doesn’t look like we are close to AGI. It looks like amazing NLP with an enormous amount of human labelling.

loading story #42796922
loading story #42804221
loading story #42801206
> there's a tendency here to these these people behind the scenes are all charlatans, fueling hype without equal substance hoping to make a quick buck before it all comes crashing down, but i don't think that's true at all. I think these people really genuinely believe they're going to get there.

I don't immediately disagree with you but you just accidentally also described all crypto/NFT enthusiasts of a few years ago.

loading story #42790744
loading story #42793853
loading story #42804420
Motivated reasoning sings nicely to the tune of billions of dollars. None of these folks will ever say, "don't waste money on this dead end". However, it's clear that there is still a lot of productive value to extract from transformers and certainly there will be other useful things that appear along the way. It's not the worst investment I can imagine, even if it never leads to "AGI"
loading story #42790018
I am not qualified to make any assumptions but I do wonder if a massive investment into computing infrastructure serves national security purposes beyond AI. Like building subway stations that also happen to serve as bomb shelters.

Are there computing and cryptography problems that the infrastructure could be (publicly or quietly) reallocated to address if the United States found itself in a conflict? Any cryptographers here have a thought on whether hundreds of thousands of GPUs turned on a single cryptographic key would yield any value?

loading story #42811783
>Maybe they really are all wrong

All? Quite a few of the best minds in the field, like Yann LeCun for example, have been adamant that 1) autoregressive LLMs are NOT the path to AGI and 2) that AGI is very likely NOT just a couple of years away.

loading story #42793461
loading story #42794075
loading story #42798346
loading story #42796945
I think it will be in between, like most things end up being. I don't think they are charlatans at all, but I think they're probably a bit high on their own supply. I think it's true that "everything is about to change", but I think that change will look more like the status quo than the current hype cycle suggests. There are a lot of periods in history when "everything changed", and I believe we're already a number of years into one of those periods now, but in all those cases, despite "everything" changing, a perhaps surprising number of things remained the same. I think this will be no different than that. But it's hard, impossible really, to accurately predict where the chips will land.
My prediction is a Apple loses to Open AI who releases a H.E.R. (like the movie) like phone. She is seen on your lock screen a la a Facetime call UI/UX and she can be skinned to look like whoever; i.e. a deceased loved one.

She interfaces with AI Agents of companies, organizations, friends, family, etc to get things done for you (or to learn from..what's my friends bday his agent tells yours) automagically and she is like a friend. Always there for you at your beckon call like in the movie H.E.R.

Zuckerberg's glasses that can not take selfies will only be complimentary to our AI phones.

That's just my guess and desire as fervent GPT user, as well a Meta Ray Ban wearer (can't take selfies with glasses).

loading story #42790045
loading story #42791732
loading story #42791220
loading story #42793071
I am hoping it is just the usual ponzi thing.
loading story #42793987
So they're either wrong or building Skynet.
Let me avoid the use of the word AGI here because the term is a little too loaded for me these days.

1) reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute.

2) intelligence at a certain level is easier to achieve algorithmically when the hardware improves. There's also a larger path to intelligence and often simpler mechanisms

3) most current generation reasoning AI models leverage test time compute and RL in training--both of which can make use of more compute readily. For example RL on coding against compilers proofs against verifiers.

All of this points to compute now being basically the only bottleneck to massively superhuman AIs in domains like math and coding--rest no comment (idk what superhuman is in a domain with no objective evals)

You can't block AGI on a whim and then deploy 'superhuman' without justification.

A calculator is superhuman if you're prepared to put up with it's foibles.

It is superhuman in a very specific domain. I didn't use AGI because its definitions are one of two flavors.

One, capable of replacing some large proportion of global gdp (this definition has a lot of obstructions: organizational, bureaucratic, robotic)...

two, difficult to find problems in which average human can solve but model cannot. The problem with this definition is that the distinct nature of intelligence of AI and the broadness of tasks is such that this metric is probably only achievable after AI is already in reality massively superhuman intelligence in aggregate. Compare this with Go AIs which were massively superhuman and often still failing to count ladders correctly--which was also fixed by more scaling.

All in all I avoid the term AGI because for me AGI is comparing average intelligence on broad tasks rel humans and I'm already not sure if it's achieved by current models whereas superhuman research math is clearly not achieved because humans are still making all of progress of new results.

> All of this points to compute now being basically the only bottleneck to massively superhuman AIs

This is true for brute force algorithms as well and has been known for decades. With infinite compute, you can achieve wonders. But the problem lies in diminishing returns[1][2], and it seems things do not scale linearly, at least for transformers.

1. https://www.bloomberg.com/news/articles/2024-12-19/anthropic...

2. https://www.bloomberg.com/news/articles/2024-11-13/openai-go...

{"deleted":true,"id":42791156,"parent":42788038,"time":1737540713,"type":"comment"}
> 1) reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute.

What would you say is the strongest evidence for this statement?

Well the contrived benchmarks the industry selling the models made up seem to be improving.
loading story #42795079
>reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute

I still have a pretty hard time getting it to tell me how many sisters Alice has. I think this might be a bit optimistic.

They plugged the hole for "how many 'r''s in 'strawberry'", but I just asked it how many "l"s in "lemolade" (spelling intentional) and it told me 1. If you make it close to, but not exactly a word it would be expecting it falls over.
loading story #42797845
What is the evidence for 1) ? I thought that the latest models were getting "somewhere" with fairly trivial reasoning tests like ARC-1
It may be that you can just find the solution for these tests by interpolating from a very large dataset.
I see it somewhat differently. It is not that technology has reached a level where we are close to AGI, we just need to throw in a few more coins to close the final gap. It is probably the other way around. We can see and feel that human intelligence is being eroded by the widespread use of LLMs for tasks that used to be solved by brain work. Thus, General Human Intelligence is declining and is approaching the level of current Artificial Intelligence. If this process can be accelerated by a bit of funding, the point where Big Tech can overtake public opinion making will be reached earlier, which in turn will make many companies and individuals richer faster, also the return on investment will be closer.
> Is the key idea here that current AI development has figured out enough to brute force a path towards AGI?

My sense anecdotally from within the space is yes people are feeling like we most likely have a "straight shot" to AGI now. Progress has been insane over the last few years but there's been this lurking worry around signs that the pre-training scaling paradigm has diminishing returns.

What recent outputs like o1, o3, DeepSeek-R1 are showing is that that's fine, we now have a new paradigm around test-time compute. For various reasons people think this is going to be more scalable and not run into the kind of data issues you'd get with a pre-training paradigm.

You can definitely debate on whether that's true or not but this is the first time I've been really seeing people think we've cracked "it", and the rest is scaling, better training etc.

> My sense anecdotally from within the space is yes people are feeling like we most likely have a "straight shot" to AGI now

My problem with this is that people making this statement are unlikely to be objective. Major players are in fundraising mode, and safety folks are also incentivised to be subjective in their evaluation.

Yesterday I repeatedly used OpenAI’s API to summarise a document. The first result looked impressive. However, comparing repeated results revealed that it was missing major points each time, in a way a human would certainly not. In the surface the summary looked good, but careful evaluation indicated a lack of understanding or reasoning.

Don’t get me wrong, I think AI is already transformative, but I am not sure we are close to AGI. I hear a lot about it, but it doesn’t reflect my experience in a company using and building AI.

loading story #42794966
loading story #42818889
I agree with your take, and actually go a bit further. I think the idea of "diminishing returns" is a bit of a red herring, and it's instead a combination of saturated benchmarks (and testing in general) and expectations of "one llm to rule them all". This might not be the case.

We've seen with oAI and Anthropic, and rumoured with Google, that holding your "best" model and using it to generate datasets for smaller but almost as capable models is one way to go forward. I would say that this shows the "big models" are more capable than it would seem and that they also open up new avenues.

We know that Meta used L2 to filter and improve its training sets for L3. We are also seeing how "long form" content + filtering + RL leads to amazing things (what people call "reasoning" models). Semantics might be a bit ambitious, but this really opens up the path towards -> documentation + virtual environments + many rollouts + filtering by SotA models => new dataset for next gen models.

That, plus optimisations (early exit from meta, titans from google, distillation from everyone, etc) really makes me question the "we've hit a wall" rhetoric. I think there are enough tools on the table today to either jump the wall, or move around it.

Yeah that's called wishful thinking when it's not straight up pipe dreams. All these people have horses in the race
Largest GPU cluster at the moment is X.ai's 100K H100's which is ~$2.5B worth of GPUs. So, something 10x bigger (1M GPUs) is $25B, and add $10B for 1GW nuclear reactor.

This sort of $100-500B budget doesn't sound like training cluster money, more like anticipating massive industry uptake and multiple datacenters running inference (with all of corporate America's data sitting in the cloud).

Shouldn't there be a fear of obsolescence?
It seems you'd need to figure periodic updates into the operating cost of a large cluster, as well as replacing failed GPUs - they only last a few years if run continuously.

I've read that some datacenters run mixed generation GPUs - just updating some at a time, but not sure if they all do that.

It'd be interesting to read something about how updates are typically managed/scheduled.

Don't they say in the article that it is also for scaling up power and datacenters? That's the big cost here.
There's the servers and data center infrastructure (cooling, electricity) as well as the GPUs of course, but if we're talking $10B+ of GPUs in a single datacenter, it seems that would dominate. Electricity generation is also a big expense, and it seems nuclear is the most viable option although multi-GW solar plants are possible too in some locations. The 1GW ~ $10B number I suggested is in the right ballpark.
>AI development has figured out enough to brute force a path towards AGI?

I think what's been going on is compute/$ has been exponentially rising for decades in a steady way and has recently passed the point that you can get human brain level compute for modest money. The tendency has been once the compute is there lots of bright PhDs get hired to figure algorithms to use it so that bit gets sorted in a few years. (as written about by Kurzweil, Wait But Why and similar).

So it's not so much brute forcing AGI so much that exponential growth makes it inevitable at some point and that point is probably quite soon. At least that seems to be what they are betting.

The annual global spend on human labour is ~$100tn so if you either replace that with AGI or just add $100tn AGI and double GDP output, it's quite a lot of money.

This has nothing to do with technology it is a purely financial and political exercise...
But why drop $500B (or even $100B short term) if there is not something there? The numbers are too big
this is an announcement not a cut check. Who knows how much they'll actually spend, plenty of projects never get started let alone massive inter-company endeavors.
The $100B check is already cut, and they are currently building 10 new data centers in Texas.
A state with famously stable power infrastructure.
$50B is to pay miners not to mine.
because you put your own people on the receiving end too AND invite others to join your spending spree.
{"deleted":true,"id":42789609,"parent":42788524,"time":1737526328,"type":"comment"}
> I don't know how to make sense of this level of investment.

The thing about investments, specifically in the world of tech startups and VC money, is that speculation is not something you merely capitalize on as an investor, it's also something you capitalize on as a business. Investors desperately want to speculate (gamble) on AI to scratch that itch, to the tune of $500 billion, apparently.

So this says less about, 'Are we close to AGI?' or, 'Is it worth it?' and more about, 'Are people really willing to gamble this much?'. Collectively, yes, they are.

note that the $500 Billion number is bravado / aspirational. MSFT Nadella said he has $80B to contribute? It is a live auction horse race in some ways, it seems.
> Is the key idea here that current AI development has figured out enough to brute force a path towards AGI? Or I guess the alternative is that they expect to figure it out in the next 4 years...

Can't answer that question, but, if the only thing to change in the next four years was that generation got cheaper and cheaper, we haven't even begun to understand the transformative power of what we have available today. I think we've felt like 5-10% of the effects that integrating today's technology can bring, especially if generation costs come down to maybe 1% of what they currently are, and latency of the big models becomes close to instantaneous.

It's a typical Trump-style announcement -- IT'S GONNA BE HUUUGE!! -- without any real substance or solid commitments

Remember Trump's BIG WIN of Foxconn investing $10B to build a factory in Wisconsin, creating 13000 jobs?

That was in 2017. 7 years later, it's employing about 1000 people if that. Not really clear what, if anything, is being made at the partially-built factory. [0]

And everyone's forgotten about it by now.

I expect this to be something along those lines.

[0] https://www.jsonline.com/story/money/business/2023/03/23/wha...

I think the only way you get to that kind of budget is by assuming that the models are like 5 or 10 times larger than most LLMs, and that you want to be able to do a lot of training runs simultaneously and quickly, AND build the power stations into the facilities at the same time. Maybe they are video or multimodal models that have text and image generation grounded in a ton of video data which eats a lot of VRAM.
> current AI development has figured out enough to brute force a path towards AGI? Or I guess the alternative is that they expect to figure it out in the next 4 years...

Or they think the odds are high enough that the gamble makes sense. Even if they think it's a 20% chance, their competitors are investing at this scale, their only real options are keep up or drop out.

This announcement is from the same office as the guy that xeeted:

“My NEW Official Trump Meme is HERE! It's time to celebrate everything we stand for: WINNING! Join my very special Trump Community. GET YOUR $TRUMP NOW.”

Your calibration is probably fine, stargate is not a means to achieve AGI, it’s a means to start construction on a few million square feet of datacenters thereby “reindustrializing America”

FWIW Altman sees it as a way to deploy AGI. He's increasingly comfortable with the idea they have achieved AGI and are moving toward Artificial Super Intelligence (ASI).
https://xcancel.com/sama/status/1881258443669172470

  twitter hype is out of control again. 

  we are not gonna deploy AGI next month, nor have we built it.

  we have some very cool stuff for you but pls chill and cut your expectations 100x!
I realize he wrote a fairly goofy blog a few weeks ago, but this tweet is unambiguous: they have not achieved AGI.
loading story #42790993
Do you think Sam Altman ever sits in front of a terminal trying to figure out just the right prompt incantation to get an answer that, unless you already know the answer, has to be verified? Serious question. I personally doubt he is using openai products day to day. Seems like all of this is very premature. But, if there are gains to be made from a 7T parameter model, or if there is huge adoption, maybe it will be worth it. I'm sure there will be use for increased compute in general, but that's a lot of capex to recover.
To me it looks like a strategic investment in data center capacity, which should drive domestic hardware production, improvements in electrical grid, etc. Putting it all under AI label just makes it look more exciting.
> Is the key idea here that current AI development has figured out enough to brute force a path towards AGI?

It rather means that they see their only chance for substantial progress in Moar Power!

Yes that is exactly what the big Aha! moment was. It has now been shown that doing these $100MM+ model builds is what it takes to have a top-tier model. The big moat is not just the software, the math, or even the training data, it's the budget to do the giant runs. Of course having a team that is iterating on these 4 regularly is where the magic is.
{"deleted":true,"id":42787962,"parent":42786892,"time":1737511512,"type":"comment"}