Hacker News new | past | comments | ask | show | jobs | submit
I worked at a VPS competitor of niantic.

I am conflicted on this report.

1) VPS is not new, the startup I worked at had a working public system in 2018.

2) The hard part about VPSs is not actually the navigation, its generating and querying the map.

How does the VPS work?

You build a point cloud of features (for us we paid people to go and record videos in cities, Tesla/Waymo/toyata/google drove cars niantic got it's players to take videos/pictures)

Align that point cloud to the 3d world, store it in a way that can be queried quickly (doing that quickly and at scale is still an area of research)

Then your client needs to extract the keypoints from an image and perform triangulation against the map to see where the camera was taken (There are calibration issues, but we ain't got time for that)

Now.

Niantic, from what I can see (and its been a while) has a database of key landmarks, but not of the areas inbetween. For decent navigation I would say that this is a massive problem.

I know niantic are pushing the whole "spatial world model" but frankly I don't think that scales. They stuff they have released is memorybound in vGPUs which isn't that useful for realtime querying.

I strongly suspect that actually they have a different system, much more traditional along the lines of colmap, or hloc, or something with a feedforward model in it.

However for the drone usercase, what you actually want is SLAM, which is a very different problem. for SLAM you need to build the map whilst your are moving, and then try and do loop closure or some other method to stop drift. Once you've gone there and back you can use that model for relocaliosation.

Yea, I find it a lot more believable that they're using this for human +/- AR stuff, not robots. E.g. walking instructions: they have photos of landmarks from many human-friendly angles, and precise positioning isn't important. That's undeniably useful for lookup purposes. Stuff in between can be estimated by GPS/3D building maps that already exist, and that's more than good enough for humans.

I just don't see how they'd convert weird-phone pictures to accurate-enough-for-SLAM purposes, especially with that in-between problem. Like, I could believe you can get a LOT of accuracy out of just photos (just watching occlusion probably gets you sub-centimeter), but in huge areas of the world they have nothing at all to stitch those high-precision areas together accurately. Existing maps and 3D building scans are wildly inaccurate on that scale. Like, it's more than a block between scannable things in my area, sometimes multiple blocks, they're not computing a precise world model from that even with existing data on e.g. openstreetmap. Robots already have enough to know their world-position with on-board SLAM + GPS + existing maps, this won't eliminate (or even reduce) SLAM as a necessity for navigation.

I'm not sure they need the in between areas, so long as the landmarks are inclusive of similar features. In fact, they probably only need a high quality 3D scans of primitive features to perform classification (walls, building, intersections, etc). I haven't played recently, so I'm unsure how distinct each landmark is.
(Visual Positioning System)