I'm working on a self-hosted search service called Hister with the same goal when I started Searx development: reduce dependence on online search engines.
Hister is a full text indexer for websites and local files which automatically saves all the visited pages rendered by your browser. It provides a flexible web (and terminal) search interface & query language to explore saved content with ease or quickly fall back to traditional search engines. This is a fundamentally different approach than what Searx follows and solves most of the weaknesses of metasearch engines. Of course it has its own weaknesses as well, but most of these are not conceptual and can be resolved by improving the software (and datasets)
I've been using it for a few months and as my local index is growing I can avoid relying on external search engines - and even websites listed in results - more and more frequently.
The initial reception is overwhelmingly positive with already more than 30 contributors and hundreds of contributions. Currently it can help with "recall" type searches mainly, but I'm planning to provide pre-indexed thematic datasets and I'm drafting a peer-to-peer index sharing concept. Maybe you can find it useful as well (or at least have some constructive criticism =]).
Links: - https://hister.org/ - https://github.com/asciimoo/hister - Background/motivation/beginnings: https://hister.org/posts/how-i-cut-my-google-search-dependen... - Small read-only demo: https://demo.hister.org/
Assuming it indexes everything locally and falls back to traditional search engines if none found, how do you feel about adding a shared middle layer? A layer that simply indexes all the canonical data that doesn't have any personal info. This way, the contributors can automatically contribute the pages they index - building a shared search engine over time! The whole thing can work without a crawler of its own (under appropriate license so people can trust it)
Hister sounds like a idea I had years ago but gave up on after running into issues with index size taking up way too much storage.
Long ago I've used Searx and really liked it but after some point didn't see the point as opposed to using Google more directly. But lately in the back of my mind I've thinking about checking in on it again.
The only feedback I have is the initial indexing from my large history was rough. There were a lot of domains that kept blocking me for exceeding rate limiting or wouldn't let me index at all. I could see it being useful to import a history file and organize it by domain inside some sort of temporary database to track/distribute attempts and get a more detailed report on complete domain failures.
Regardless though - great work!
- If I wanted to use use my domain list to start hister, to download my preconfigured / like domains?
- Can I make some pages to rank higher in it?
- Can I assign tags to pages (by which I could later on filter?)
My domain index
A VPS with without a black listed IP is good. A simple rootless container, update is easy.
Configuration takes little time, not much.
I still hate that I have to double the bang to use the same bang as DDG.
Example: "!!wde Ente" to go to the German wikipedia page about duck instead of "!wde Ente" with DDG.