Hacker News new | past | comments | ask | show | jobs | submit

OpenAI O3 breakthrough high score on ARC-AGI-PUB

https://arcprize.org/blog/oai-o3-pub-breakthrough
loading story #42473876
loading story #42479847
The programming task they gave o3-mini high (creating Python server that allows chatting with OpenAI API and run some code in terminal) didn't seem very hard? Strange choice of example for something that's claimed to be a big step forwards.

YT timestamped link: https://www.youtube.com/watch?v=SKBG1sqdyIU&t=768s (thanks for the fixed link @photonboom)

Updated: I gave the task to Claude 3.5 Sonnet and it worked first shot: https://claude.site/artifacts/36cecd49-0e0b-4a8c-befa-faa5aa...

It's good that it works since if you ask GPT-4o to use the openai sdk it will often produce invalid and out of date code.
loading story #42476501
loading story #42474095
loading story #42473769
I would say they didn’t need to demo anything, because if you are gonna use the output code live on a demo it may make compile errors and then look stupid trying to fix it live
If it was a safe bet problem, then they should have said that. To me it looks like they faked excitement for something not exciting which lowers credibility of the whole presentation.
loading story #42475250
Models are predictable at 0 temperatures. They might have tested the output beforehand.
Models in practice haven't been deterministic at 0 temperature, although nobody knows exactly why. Either hardware or software bugs.
loading story #42476517
loading story #42475843
loading story #42473497
loading story #42473810
loading story #42479422
loading story #42473419
loading story #42475012
loading story #42478098
loading story #42474032
loading story #42480067
loading story #42476825
loading story #42477276
loading story #42480142
loading story #42473525
loading story #42477937
loading story #42474799
loading story #42473883
loading story #42476965
loading story #42473442
loading story #42474214
loading story #42473456
loading story #42473976
loading story #42473475
loading story #42474068
loading story #42475510
loading story #42479649
loading story #42478456
loading story #42476797
I'm 22 and have no clue what I'm meant to do in a world where this is a thing. I'm moving to a semi rural, outdoorsy area where they teach data science and marine science and I can enjoy my days hiking, and the march of technology is a little slower. I know this will disrupt so much of our way of life, so I'm chasing what fun innocent years are left before things change dramatically.
loading story #42476258
loading story #42477218
loading story #42476263
I'm the same age as you; I feel lost, erring in being a little too pessimistic.

Feels like I hit the real world just a couple years too late to get situated in a solid position. Years of obsession in attempt to catch up to the wizards, chasing the tech dream. But this, feels like this is it. Just watching the timebomb tick. I'd love to work on what feels like the final technology, but I'm not a freakshow like what these labs are hiring. At least I get to spectate the creation of humanity's greatest invention.

This announcement is just another gut punch, but at this point I should expect its inevitable. A Jason Voorhees AGI, slowly but surely to devour all the talents and skills information workers have to offer.

Apologies for the rambly and depressing post, but this is reality for anyone recently out or still in school.

At least you're disillusioned with the idea of a long term career before a lot of other people. It's disturbing seeing how ready people are to go into a lifelong career and expecting stability and happiness in the world we're heading into.

We are living in a world run by and for the soon to be dead, many of which have dementia, so empathic policy and foresight is out of the question, and we're going to be picking up the incredibly broken scraps of our golden age.

And not to get too political but the mass restructuring of public consciousness and intellectual society due to mass immigration for an inexplicable gdp squeeze and social media is happening at exactly the wrong time to handle these very serious challenges. The speed at which we've undone civil society is breakneck, and it will go even further, and it will get even worse. We've easily gone back 200 years in terms of emotional intelligence in the past 15.

Put another way, you have deep conviction in a change that vast majority of people have not even seen yet, never mind grokked, and you're young enough to spend some decent amount of time on education for "venn'ing" yourself into a useful tool in the future. If you have a baseline education, there are any number of orthogonal skills you could add, be it philosophy, fine art, medicine, whatever. You know how to skate and you know where the puck is going, most most people, don't even see the rink.
loading story #42476342
loading story #42476310
loading story #42476539
loading story #42476794
loading story #42476253
loading story #42476291
loading story #42476280
loading story #42476475
loading story #42473483
loading story #42479507
loading story #42473710
loading story #42473446
loading story #42476694
loading story #42473709
loading story #42477784
loading story #42473873
loading story #42474246
loading story #42476937
loading story #42479686
loading story #42474957
loading story #42474220
loading story #42474453
loading story #42473477
loading story #42477086
loading story #42473563
loading story #42477578
loading story #42478153
loading story #42478418
loading story #42478115
loading story #42479249
loading story #42474242
loading story #42473895
loading story #42478290
loading story #42478264
loading story #42473607
loading story #42474945
loading story #42473725
loading story #42475962
loading story #42476247
loading story #42479031
loading story #42477637
loading story #42478031
loading story #42474756
Does anyone have prompts they like to use to test the quality of new models?

Please share. I’m compiling a list.

loading story #42473972
loading story #42473969
loading story #42475205
loading story #42474926
loading story #42477345
loading story #42474798
loading story #42477206
loading story #42477996
loading story #42476900
loading story #42477010
loading story #42474011
loading story #42479022
loading story #42474025
loading story #42477467
loading story #42479289
loading story #42474091
loading story #42478880
loading story #42475213
loading story #42477852
loading story #42478495
loading story #42475436
loading story #42475500
loading story #42473867
Why would they give a cost estimate per task on their low compute mode but not their high mode?

"low compute" mode: Uses 6 samples per task, Uses 33M tokens for the semi-private eval set, Costs $17-20 per task, Achieves 75.7% accuracy on semi-private eval

The "high compute" mode: Uses 1024 samples per task (172x more compute), Cost data was withheld at OpenAI's request, Achieves 87.5% accuracy on semi-private eval

Can we just extrapolate $3kish per task on high compute? (wondering if they're withheld because this isn't the case?)

loading story #42473963
loading story #42475543
loading story #42473891
loading story #42474773
loading story #42474048
loading story #42474605
loading story #42473696
loading story #42475354
loading story #42479347
fun! the benchmarks are so interesting because real world use is so variable. sometimes 4o will nail a pretty difficult problem, other times o1 pro mode will fail 10 times on what i would think is a pretty easy programming problem and i waste more time trying to do it with ai
loading story #42476386
loading story #42477006
loading story #42476296
loading story #42474814
loading story #42475194
I just graduated college, and this was a major blow. I studied Mechanical Engineering and went into Sales Engineering because cause I love technology and people, but articles like this do nothing but make me dread the future.

I have no idea what to specialize in, what skills I should master, or where I should be spending my time to build a successful career.

Seems like we’re headed toward a world where you automate someone else’s job or be automated yourself.

You are going through your studies just as a (potentially major) new class of tools is appearing. It's not the first time in history - although with more hype this time: computing, personal computing, globalisation, smart phones, chinese engineering... I'd suggest (1) you still need to understand your field, (2) you might as well try and figure out where this new class of tools is useful for your field. Otherwise... (3) carry on.

It's not encouraging from the point of view of studying hard but the evolution of work the past 40 years seems to show that your field probably won't be your field quite exactly in just a few years. Not because your field will have been made irrelevant but because you will have moved on. Most likely that will be fine, you will learn more as you go, hopefully moving from one relevant job to the next very different but still relevant job. Or straight out of school you will work in very multi-disciplinary jobs anyway where it will seem not much of what you studied matters (it will but not in obvious ways.)

Certainly if you were headed into a very specific job which seems obviously automatable right now (as opposed to one where the tools will be useful), don't do THAT. Like, don't train as a typist as the core of your job in the middle of the personal computer revolution, or don't specialize in hand-drawing IC layouts in the middle of the CAD revolution unless you have a very specific plan (court reporting? DRAM?)

Yes but it’s different this time. LLMs are a general solution to the automation of anything that can be controlled by a computer. You can’t just move from drawing ICs to CAD, because the AI can do that too. AI can write code. It can do management. It can even do diplomacy. What it can’t do on its own are the things computers can’t control yet. It has also shown little interest so far in jockying for social status. The AI labs are trying their hardest to at least keep the politics around for humans to do, so you have that to look forward to.
"The proof is trivial and left as an exercise for the reader."

The technical act of solving well-defined problems has traditionally been considered the easy part. The role of a technical expert has always been asking the right questions and figuring out the exact problem you want to solve.

As long as AI just solves problems, there is room for experts with the right combination of technical and domain skills. If we ever reach the point where AI takes the initiative and makes human experts obsolete, you will have far bigger problems than career.

That's the sort of thing ideas guys think. I came up with a novel idea once, called Actually Portable Executable: https://justine.lol/ape.html It took me a couple days studying binary formats to realize it's possible to compile binaries that run on Linux/Mac/Windows/BSD. But it took me years of effort to make the idea actually happen, since it needed a new C library to work. I can tell you it wasn't "asking questions" that organized five million lines of code. Now with these agents everyone who has an idea will be able to will it into reality like I did, except in much less time. And since everyone has lots of ideas, and usually dislike the ideas of others, we're all going to have our own individualized realities where everything gets built the way we want it to be.
loading story #42477566
loading story #42477138
loading story #42477539
loading story #42477137
loading story #42477269
loading story #42477367
loading story #42476835
loading story #42477445
loading story #42477282
loading story #42477532
loading story #42477144
Just give it a year for this bubble/hype to blow over. We have plateaued since gpt-4 and now most of the industry is hype-driven to get investor money. There is value in AI but it's far from it taking your job. Also everyone seems to be investing in dumb compute instead of looking for the new theoretical paradigm that will unlock the next jump.
loading story #42476926
loading story #42477002
loading story #42477012
loading story #42477284
loading story #42476838
loading story #42477025
loading story #42477333
loading story #42477500
loading story #42477264
loading story #42476827
loading story #42477433
loading story #42477104
loading story #42476854
loading story #42477140
loading story #42476882
loading story #42477072
loading story #42477130
loading story #42477687
loading story #42477502
loading story #42477068
loading story #42476812
loading story #42477312
loading story #42477298
loading story #42477258
loading story #42477040
loading story #42476833
loading story #42476880
loading story #42477022
loading story #42476007
loading story #42473583
loading story #42476511
All those saying "AGI", read the article and especially the section "So is it AGI?"
loading story #42477694
loading story #42476917
loading story #42475296
loading story #42474905
loading story #42478754
loading story #42477365
loading story #42477940
When is this available? Which plans can use it?
loading story #42477171
loading story #42473893
loading story #42477228
loading story #42474440
loading story #42474609
loading story #42477986
loading story #42477507
loading story #42476544
loading story #42477048
Just curious, I know o1 is a model OpenAI offers. I have never heard of the o3 model. How does it differ from o1?
loading story #42474952
loading story #42474669
loading story #42473713
loading story #42473399
loading story #42475592
loading story #42476891
loading story #42473568
loading story #42475313
loading story #42474456
loading story #42473577
loading story #42478394
loading story #42476198
loading story #42477713
loading story #42476834
loading story #42473424
loading story #42473995
Great results. However, let's all just admit it.

It has well replaced journalists, artists and on its way to replace nearly both junior and senior engineers. The ultimate intention of "AGI" is that it is going to replace tens of millions of jobs. That is it and you know it.

It will only accelerate and we need to stop pretending and coping. Instead lets discuss solutions for those lost jobs.

So what is the replacement for these lost jobs? (It is not UBI or "better jobs" without defining them.)

loading story #42474357
loading story #42474327
loading story #42473851
Do you follow Jack Clark? I noticed he's been on the road a lot talking to governments and policy makers, and not just in the "AI is coming" way he used to talk.
loading story #42476532
loading story #42475157
loading story #42473389
loading story #42475969
loading story #42473739
loading story #42473411
loading story #42478998
loading story #42473401
loading story #42473473
loading story #42479101
loading story #42477842
loading story #42473370
loading story #42473342
loading story #42473384
loading story #42473458
loading story #42473864
loading story #42477356
loading story #42476384
loading story #42477027
loading story #42475105
loading story #42476268
loading story #42477319