Story Detail of id 41454999 | Liveview Hacker News

londons_explore4 months ago | on: Show HN: An open-source implementation of AlphaFold3

If I'm understanding correctly, the model code itself is only a tiny proportion of the challenge. The training compute and training data are far bigger parts.

Google has access to training compute on a scale perhaps nobody else has.

littlestymaar4 months ago | parent

Is that really the case though? Available compute sounds unlikely to be the limiting factor here, compared to data which is way scarcer than what's being used to train LLMs, and I suspect Google used mostly publicly available data for training unless they signed deals beforehand with biotechnology companies which have access to more data. That's possible of course, but that doesn't feel very google-y.

EdHarris4 months ago | root | parent

Yes, all data Google used was public. We have enough compute from YC (thanks YC!) to do this. The main thing is the technical infrastructure - processing the data, efficient loading at training time, proper benchmarking, etc. We are building these now.

littlestymaar4 months ago | root | parent

Thanks for the answer! It's much better to have the definitive answer rather than rely on gut feeling (even though it was right in this case).

Keep up the good work!

How much compute does YC give you access to btw? Is that just things like azure credit or do YC have actual hardware?

#visit	11477787
#session	45276
#live-session	0