Story Detail of id 47393130 | Liveview Hacker News

halJordan20 hours ago | on: Chrome DevTools MCP (2025)

I love how HN is loving this idea when it's the exact same thing Anthropic and OpenAi (and every other llm maker) did.

It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.

loading story #47403259

deaux18 hours ago | parent | next

Both scale and purpose make them completely different things. You're acting as if they're the same when they're not.

eipi10_hn17 hours ago | parent | next

I won't comment about dl but ads are trackers and spyware for me. I don't spy on websites' owners, I have my human rights to stop those trackers.

Zuck serves ads/spywares to other users, he deserves to taste his own medicines, not me.

coldtea8 hours ago | parent | next

Yes, it's a god's gift when the average user can do it, and satan's curse what a hated fucking mega-corp is doing it.

Where's the contradiction?

friendzis12 hours ago | parent | next

You can see this pattern in many different topics: updoots are highly correlated with a positive answer to "do I personally get to profit"?

achierius12 hours ago | root | parent

Yes, and? People need to eat. Billionaires are generally not interested in whether or not the average Joe gets to eat.

cyberax15 hours ago | parent | next

I would love to pay for content. I'm _paying_ for YouTube Premium.

But heck. Do I hate the YouTube interface, it degraded far past usability.

zx808015 hours ago | root | parent

Write to their support. Oh, wait.

tclancy19 hours ago | parent | next

So you’re that Hal Jordan then? Why would a Green Lantern feel the need to defend either? I feel like the Guardians would not accept your arguments as soon as you got to Oa, poozer. I guess what I am saying is don’t have a famous name. Seems obvious.

llbbdd18 hours ago | root | parent

OP appears to be talking about real life. What are you on about?

loading story #47394546

miki12321110 hours ago | parent

You conflate web crawling for inference with web crawling for training.

Web crawling for training is when you ingest content on a mass scale, usually indiscriminately, usually with a dumb crawler for scale's sake, for the purposes of training an LLM. You don't really care whether one particular website is in the dataset (unless it's the size of Reddit), you just want a large, diverse, high-quality data mix.

Web crawling for inference is when a user asks a targeted question, you do a web search, and fetch exactly those resources that are likely to be relevant to that search. Nothing ends up in the training data, it's just context enrichment.

People have a much larger issue with crawling for training than for inference (though I personally think both are equally ok).

#visit	13,136,022
#session	74,665
#live-session	0