Hacker News new | past | comments | ask | show | jobs | submit
The only solution is regulation. If all content created by anyone has a copyright, how does an implicit opt-in (which is what happens if you don't create a robots.txt file for your website) for scraping make any sense? Moreover, even if you have a robots.txt, AI (or whatever) bots often don't respect it (or use workarounds - they outsource scraping of such "restricted" sites to unethical third-parties to get the data; Meta has even resorted to piracy, openly!). So clearly, the logic and the "honour system" has failed.

Cloudflare, Google Captcha, HCaptcha etc. are all shitty technical solutions because, as we are all discovering, it comes at the cost of our privacy (i.e. our personal data may monetise these services) and / or our computing resource and time. If current copyright laws aren't sufficient to prevent this, we have to acknowledge the system is broken. The answer could be enhancing it with some kind of Digital Millennium Copyright Act (DMCA) -like laws, but in favour of the creators against BigTech or rogue actors.

- Web-scraping and copyright law - https://www.neudata.co/blog/web-scraping-and-copyright-law

- Why DMCA Claims Against Web Scrapers Face Long Odds - https://capstonedc.com/insights/why-dmca-claims-against-web-...

Or you could let information be free, at least the stuff that’s on the public net.

As for issues like bots overloading websites or using too many resources scaling laws will take care of it quickly, it’s not like you can’t serve thousands of RPS from a Raspberry Pi these days.

I don't think regulation will stop web scraping, not least of which because it can be done from locations outside the jurisdiction of the regulations.

> we have to acknowledge the system is broken

The system is broken. It probably takes, what, 10 seconds or less to use a residential or foreign proxy, 6+ months to internationally track and prosecute a single offender? So like a million times more effort going the regulatory route.

Just as criminal laws don't end all crimes, copyright laws and anti-scraping regulation won't end all scraping. But it will greatly reduce it and limit it to rogue actors. Two examples I can cite here are the laws against email spams and laws against unsolicited marketing calls - they had a definite impact in reducing both (even in India, from where I am, where implementation of laws are often lax).
Exactly. Bot activity is a problem of volume, not all-or-nothing. Solving 95% of it would be a win.