Hacker News new | past | comments | ask | show | jobs | submit
I don't know why people mess with tesseract in 2026, attention-based OCRs (and more recently VLMs) outperformed any LSTM-based approach since at least 2020.

My guess is that it's the entry-point to OCR and the internet is flooded by that, just like pandas for data processing.

Painful comparison haha

Leaving a comment so I can more easily find this

And for the people wondering about Pandas, use Polars instead

loading story #47479650
loading story #47479418