Hacker News new | past | comments | ask | show | jobs | submit

Show HN: A blocklist to remove spam and bad websites from search results

https://github.com/popcar2/BadWebsiteBlocklist
I'm fed up too. Spammy, AI-looking sites are showing up more and more. For some reason, many of them use the same Wordpress theme with a light gray table of contents - they look like this: https://imgur.com/a/totally-not-ai-generated-efsumgZ

The problem seems worse on "alternative" search engines, e.g. DuckDuckGo and Kagi, which both use Bing. It's been driving me back to Google.

A blocklist seems like a losing proposition, unless, like adblock filter lists, it balloons to tens of thousands of entries and gets updated constantly.

Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That's hardly what I would have chosen.

loading story #42698286
loading story #42699974
loading story #42708022
loading story #42743367
loading story #42701060
It's not going to be long before we need to move to a whitelist model, rather than a blacklist model.

It ironically makes me think of the Yahoo Web Directory in the 90s.

Time is a flat circle.

loading story #42710093
loading story #42712402
Installed! This should not be a function of the search engine nor a plugin. This should be integrated in the browser.

Another great function (not for this plugin) should be the option to "bundle" all search results from the same domain. Stuff them under one collapsible entry. I hate going through lists and pages of apple/google/synology/sonos/crab urls when I already know that I have to search somewhere else.

loading story #42713078
So, if you already run uBlock Origin (and of course you are), you can use this list without installing any additional extensions by going to 'Filter lists' in the uBlock settings, then Import, then enter https://raw.githubusercontent.com/popcar2/BadWebsiteBlocklis... as the URL.

Not saying you should, just that you could...

loading story #42698392
Hi @popcar2 — how are you sourcing the domains for the blocklist? We'd like to evaluate those domains and consider whether they should be removed from DuckDuckGo as spam. You can also report a site directly in the search results by clicking the three-dot menu next to the link and selecting "Share Feedback about this Site".
loading story #42701022
loading story #42725403
With the Kagi search engine is a way in the settings to bulk-upload lists of domains to block (or upvote) them. Has anyone uploaded a list like this to it?

I may do that.

loading story #42699347
loading story #42699744
The problem with a list like this is that a “bad website” is in the eye of the beholder. I’m not saying that there’s anything wrong with you personally not liking the Shopify or the Semrush blog. But I think that everyone else has their own calculus.

It’s the same reason why social media blocklists can be problematic—everyone’s calculus is different.

My suggestion is that you promote it as a starter and suggest that users fork it for their own needs.

loading story #42700408
loading story #42710125
I recently started a crypto scam/phishing blocklist if you wanna roll these into your list as well.

also works well with Pi-hole and other platforms.

https://github.com/spmedia/Crypto-Scam-and-Crypto-Phishing-T...

This is one of those features a proper search engine (i.e., not a thinly-veiled advertising network) should have. If users can customize their search results and share their sorting/filtering methods, then that presents a large number of constantly-moving targets that greatly drives up the cost of SEO. There's no "making the Google algorithm happy." Instead, it becomes more "making the users happy."
loading story #42699196
I don't understand why so much corporate blogs are blocked. Most of them are about their product, or about the industry in general.

- For example, kaspersky blog doesn't look bad.

- CCleaner blog is just a list of update.

loading story #42709677
loading story #42709698
Who’s going to be the first to make the PR for Medium and “dev.to”?
loading story #42699128
everytime i search content about supabase, some trash ai generated content website like restack shows and waste my time. I am not saying restack is bad, but a customizable blocker to block the site for specific topic might be good for me.
Related: Freya Holmér - "Generative AI is a Parasitic Cancer" https://www.youtube.com/watch?v=-opBifFfsMY (1h19m54s) [2025-01-02].

She talks at length about how pages of AI-generated nonsense text are cluttering search results on Google and all other search engines.

loading story #42709563
DuckDuckGo and Kagi allow you to remove entire sites from search results and it is the best feature of these websites.
loading story #42698435
I've been using GoogleHitHider, which also works on other search engines like DDG. Worked well for many years. It's a list I curated myself though for personal use, I definitely wouldn't mind seeing what other people had.
I love that it just includes all of msn.com.
loading story #42699815
This is cool. It would be pretty easy to add the domains from this list to Kagi's blocked domain list and have it integrated in the search without a plugin. The downside obviously is having to update that list from the repo, but still, as OP says, even with just a hundred domains blocked it's already a big improvement.
I think there's big potential in using DNS blacklists for this: they have the advantage of being massively scalable and simple to maintain, and clients configuration to use them is also easy.

The scalability comes from the caching inherent in DNS; instead of having to have millions of people downloading text files from a website over HTTP on a regular basis, the data is in effect lazy-uploaded into the cloud of caching DNS resolvers, with no administration cost on behalf of the DNSBL operator.

Reputation whitelists (or other scoring services) would also be just as easy to implement.

loading story #42699265
This is cool! Not entirely sure whether I think it's a good idea, but I wonder if it'd be useful to come up with a way to tranche websites.

Some sites are complete garbage and should be blocked, for course. Others (e.g., in my experience, Quora) are sometimes quite good and sometimes quite bad. Wouldn't be my first choice, but I've found them useful at times.

For a given search, maybe you try with the most aggressive blocking / filtering. If you fail to find what you're looking for, maybe soften the restriction a bit.

Maybe this is overwrought...

One enraging thing, if some guy on GitHub can do this, why the F** can't billion-dollar search giants put in a little human effort to do it too, right in their search engines?

SEO spam and AI slop are easily spotted on the human level. Google has hundreds of thousands of employees. Just put ONE of them on this f**ing job!

It's criminal what these companies have let happen to the web.

loading story #42714657
Tangent, I may laughably use Malware Bytes but when I'm image searching on Google and it stops me from opening a picture with a adware alert. I'm like "oh damn"... I use an adblocker/generally don't do anything sus on my main OS but yeah. I'm still unsure am I safe? (paranoia ensues)

I use a VM in other scenarios but even that, properly separated?

What on earth are people still searching for using search engines? I’ve found chatGPT to be significantly better at answering question I have than google or DDG or any other search engine. It’s still AI slop, but at least it’s a bit more succinct, and I can ask follow up questions
{"deleted":true,"id":42700405,"parent":42697346,"time":1736875262,"type":"comment"}
Brave has goggles that do exactly this. you can even share the list with others.

https://search.brave.com/goggles/discover

hosts with tens of thousands of entries, kagi for search and recipes from the spammer godsend Llm in librewolf is still an option but no idea for how long.
does the msn.com one block their news site?
How do you ensure good contributors and good contributions?

Do you have a forum where you discuss prospective contributions etc?

loading story #42700429
Does anybody know if is it possible to apply a similar configuration in a searxng instance?
I think it could also be accomplished using searxng, and blocking it there.
download.cnet.com serves up spam nowadays? How far the mighty have fallen.
Does Google still allow that in an add-on?
{"deleted":true,"id":42711449,"parent":42697346,"time":1736952843,"type":"comment"}
Thank you for your service
What about just using perplexity? It's already doing that I think.
loading story #42698133