Hacker News new | past | comments | ask | show | jobs | submit
We finetune our own VLMs for this -- unfortunately prefer not to share which ones we use specifically! ClickClickClick looks awesome, have you heard of FerretUI (https://arxiv.org/pdf/2404.05719)? Pretty similar idea.
loading story #42707801