Story Detail of id 48466355 | Liveview Hacker News

selfawareMammal9 hours ago | on: What it feels like to work with Mythos

What are people working on that they see such a substantial difference between Mythos and Opus? I'd say I'm working with advanced stuff and more than often Deepseek is even more than enough. Why is everybody a genius in here?

jenniferhooley8 hours ago | parent | next

Just depends what you are working on. If you are trying to make a video game that's at a level of a decent indie game (think Hades/Baazar/etc), making UI elements/VFX/complex shaders/etc that are organic/interactive/animated that don't feel like a little dogshit vibeslop web-game, then none of the models are even close to good enough to get it done easily. Huge percentage of problems in top 3% games is really hard for any of the models to do with simple prompting.

Personally I don't really care, because I like coding and learning myself and DeepSeek Flash is all I really care about. But it's really easy to have a ton of benchmarks where the top models can't get anywhere close - and I like to test them on these problems to see how good they are getting.

Fable 5 is def a little better than 4.8 btw.

mervz9 hours ago | parent | next

We see the same thing when new laptops are announced and every employee all of a sudden needs to upgrade, despite the fact that 90% of people would be able to make do with a Macbook Neo.

Our_Benefactors8 hours ago | root | parent

> despite the fact that 90% of people would be able to make do with a Macbook Neo.

Myth. Total myth! I recently had to beg for more RAM after continually hitting swap space which causes tools like dictation to stop working, failure to load certain websites without rebooting, and so on. Devs do in fact need powerful machines and the ~$500-1000 an employer saves upfront in machine costs is dwarfed by productivity losses.

Giving your engineering employees new machines in a 2-year cycle that are between the middle and high end is one of the cheapest ROI decisions that a tech org can make.

oarsinsync7 hours ago | root | parent

Surely devs could just uninstall Slack, and get the same combined RAM & productivity boost?

matheusmoreira3 hours ago | parent | next

I'm working on my own programming language. I've also been exploring open source projects to contribute to. Maybe something that helps me pivot from hobbyist to professional. If such a thing is even possible in this day and age.

Fable 5 found quite a few issues Opus 4.8 missed on code review, even though the stupid cybersecurity nonsense downgraded it. I can't tell you more, I only get a single session per 5h window on Max 5x. Only ran two sessions so far.

ianm2189 hours ago | parent | next

I’ve been working on implementing some common web infra type projects in Rust lately. Basically trying to use a lot of the great primatives in Rust like rustls (modern openSSL) and Tokio (async) to build memory safe or close, nginx drop in replacements.

A small portion of this effort is having a high quality Lua in Rust repo. I’m using mythos to fix some of the performance issues with my Lua interpreter that gpt 5.5/ opus 4.8 had stone walled on.

Not sure if Mythos will be able to crack this but it has been running for a couple hours now with some promising results.

Performance charts linked here if your curious https://github.com/ianm199/lua-rs

mplanchard6 hours ago | root | parent

What’s wrong with mlua?

ianm2186 hours ago | root | parent

Mlua works for many use cases but is a wrapper around the C code, so you need to bundle C as part of the build. So this is worse for cross compilation and makes it so you can't easily use mlua projects in wasm32-unkown-unknown. An example is that it would be hard to run a game in the browser that exposes Lua scripting with mlua.

The other reason is that because mlua is just a wrapper around the C code, it has unsafe you can't really get around. So for example Lua is used in Redis, which has this critical CVE https://github.com/redis/redis/security/advisories/GHSA-4789... that a memory safe version of Lua wouldn't have to deal with.

Mlua is still fine or even better for many other cases though!

mplanchard5 hours ago | root | parent

The WASM thing makes sense. Do you need unknown-unknown? Seems like support exists for emscripten and wasi: https://github.com/mlua-rs/mlua/issues/366

It just seems like a lot of hassle to write a lua interpreter, although it would be nice to see a high quality one in Rust :)

Hematita was promising, but looks abandoned.

ianm2185 hours ago | root | parent

Yeah an example is that currently you can't build Bevy games in the browser with scripting in Lua, so I've gotten a little traction there.

And yes it seems like there has been many attempts to get a solid Rust Lua over the years and most never reached parity so hoping some people can find use case for it! This one is at full parity in terms of behavior and performance is getting to within striking distance.

mplanchard5 hours ago | root | parent

Best of luck! We used mlua at $JOB for scripting support, and it worked great, but we’d have preferred a pure rust solution if one existed with the right performance profile

jstummbillig7 hours ago | parent | next

I am sure you would not find it hard to exhaust any model, if you kept upping your ask enough times.

On the margins, suppose the prompt is literally: "Build a feature complete, high polish Facebook clone". Facebook is complex but likely not super complicated tech, and still I would assume that (after having burned through a substantial amount of tokens) you would find substantial enough differences in the outcomes between different models on that prompt on various fronts.

The above ask is obviously not useful, but what's preventing you from taking on bigger chunks until you approach the limit? At some point you would hit a boundary, where the diff will be obvious.

9 hours ago | parent | next

{"deleted":true,"id":48466522,"parent":48466355,"time":1781033861,"type":"comment"}

mohsen19 hours ago | parent

I had a few of the benchmarks left alone and was working on tech debt knowing that a new model is going to be released soon. For my project (tsz.dev) Opus 4.8 was running in circles without producing results for a while for those tasks

#visit	13,707,224
#session	74,665
#live-session	0