Hacker News new | past | comments | ask | show | jobs | submit
Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.
If it's high value, there isn't really much you can do that will be completely effective. Traditional captchas can often be beaten by AI, or by "captcha farms" where impoverished people are paid pennies to complete captchas. Fingerprinting can be beaten by using a full browser to make the requests. Basically anything you do is just a matter of making it more expensive for bots to access it.
Sure, the whole premise is exactly that proof of work reduces the value of scraping, while having negligible impact on users. If the data is so valuable that bot operators are willing to pay 10s of cpu, then other measures are necessary.

Nevertheless even for these high value cases, you can still argue that it disincentivizes the business model, it becomes less efficient.

It will not scare away bots but 10 seconds of wait (CPU or only a sleep) will turn away many real users. "This site is so slow, I'll use something else." A kind of reverse captcha.
Maybe, the proof of work can run in the background.
Or it can run as part of a checkout wizard's "verifying your browser and processing your payment, don't close your tab" step.
{"deleted":true,"id":48346849,"parent":48346769,"time":1780243696,"type":"comment"}