Yes, all data Google used was public. We have enough compute from YC (thanks YC!) to do this. The main thing is the technical infrastructure - processing the data, efficient loading at training time, proper benchmarking, etc. We are building these now.
loading story #41465645