Story Detail of id 48346277 | Liveview Hacker News

adamtaylor_132 hours ago | on: Cloudflare Turnstile requiring fingerprintable WebGL

So if you need to prevent bot abuse, but also don't want an ugly captcha every time someone goes to sign up, is there a better option?

ribtoks2 hours ago | parent | next

Use proof-of-work captchas, many are private by default. Look into Private Captcha or Cap captcha.

loading story #48347654

phoronixrly2 hours ago | root | parent

How does proof of work stop bots?

stephantul2 hours ago | root | parent | next

Because it destroys the economics of scraping. It’s too expensive with proof of work, or at least not as economically viable

gruez2 hours ago | root | parent

Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.

loading story #48347796

stephantul1 hour ago | root | parent | next

Sure, the whole premise is exactly that proof of work reduces the value of scraping, while having negligible impact on users. If the data is so valuable that bot operators are willing to pay 10s of cpu, then other measures are necessary.

Nevertheless even for these high value cases, you can still argue that it disincentivizes the business model, it becomes less efficient.

pmontra2 hours ago | root | parent

It will not scare away bots but 10 seconds of wait (CPU or only a sleep) will turn away many real users. "This site is so slow, I'll use something else." A kind of reverse captcha.

Hnrobert421 hour ago | root | parent | next

Maybe, the proof of work can run in the background.

loading story #48347601

1 hour ago | root | parent

{"deleted":true,"id":48346849,"parent":48346769,"time":1780243696,"type":"comment"}

ray_v2 hours ago | root | parent

If it gets too expensive/time-consuming to scrape then it won't happen at scale (as much)?

ImPostingOnHN2 hours ago | parent

The tool "Anubis" uses proof of work instead

BetterThanSober2 hours ago | root | parent | next

With a tuned cool down period this isn't a problem, especially if you frequent the sites. OpenWRT uses Anubis and usually when I need to peruse their site I'm on a very low-end device. I prefer waiting much more over finding Waldos

But in principle I agree that there's no good answer to this, scraping _is_ useful and I bet most of us here had scraped something, it is AI company and their use of human's material for training without consent and return that led us to this (I know botting exists in forum since forum is a thing but it is easily solved by human moderators and keyword filter)

timpera2 hours ago | root | parent | next

Anubis often takes more than 60 seconds to complete on low-end devices (especially old smartphones). It seems like there's no good solution.

QuantumNomad_1 hour ago | root | parent | next

But after you’ve completed the Anubis PoW challenge for a site, it remains valid for some amount of time.

So it’s not quite as horrible as it sounds.

I have setting up Anubis for my own sites on my todo list. And I wish more people did it too. I don’t really mind waiting a little bit extra every now and then before the page loads. What I do mind is ReCaptcha asking me to click all the pictures with buses in them etc. And especially when I have to do it several times over before it’s happy. I’d rather wait a minute for a page to load than to ever solve a ReCaptcha again, if given the choice.

dangus2 hours ago | root | parent | next

That must be really low end then. I’ve never seen it complete in a timeframe that was slower than “I can’t even read the page before it redirects”

loading story #48347504

ImPostingOnHN2 hours ago | root | parent

There's not an easy, perfect solution, for sure. Newer phones get faster, but spammer compute gets cheaper.

Some sort of decentralized trust web seems like another option, though less viable.

WesolyKubeczek2 hours ago | root | parent

One of unexpected outcomes from AI-induced hardware shortage may be that, in fact, compute won’t be getting cheaper and may in fact get more expensive…

phoronixrly2 hours ago | root | parent

How does Anubis stop bots?

redwall_hp14 minutes ago | root | parent | next

Anubis is designed to stop a certain class of badly behaved bots. It intentionally doesn't run if a bot identifies itself with a UA, such as Googlebot, because then you can rate limit it or block by UA and with other tools.

Anubis is active when a user agent looks like a web browser (e.g. contains the "Mozilla" substring every major browser uses). The reverse proxy serves an interstitial page that does a proof-of-work check, validated server side, setting a cookie if it passes.

This means a legitimate user won't constantly get the proof of work check, because they already passed it. But AI bots rotating through tons of residential IPs to scrape your forum or git forge or whatever will be slowed down.

Overall, I like the idea. It's unobtrusive, privacy preserving, and seems to be working out well for a lot of sites.

basilikum55 minutes ago | root | parent | next

The real answer is that it makes sites behave different requiring the bots to make slight adjustments.

And there are just not enough sites using Anubis for the people and companies running the bots to care to do that.

If you do care bypassing Anubis is trivial.

xena2 hours ago | root | parent

Bots don't execute JavaScript or follow complicated redirects.

pwg1 hour ago | root | parent

Bots don't [currently] execute JavaScript or follow complicated redirects.

They don't now, but enough "high value to the bots" pages turning on JS or complicated redirects will simply result in the bot authors adding JS execution or redirect following so they can continue "botting" the sites they want to scrape.

It's a hole with no bottom. Each one-up on the anti-bot side will eventually be handled on the bot side.

#visit	13,481,360
#session	74,665
#live-session	0