Hacker News new | past | comments | ask | show | jobs | submit
Ohi, I'm the author of the open source Searx metasearch engine.

I'm working on a self-hosted search service called Hister with the same goal when I started Searx development: reduce dependence on online search engines.

Hister is a full text indexer for websites and local files which automatically saves all the visited pages rendered by your browser. It provides a flexible web (and terminal) search interface & query language to explore saved content with ease or quickly fall back to traditional search engines. This is a fundamentally different approach than what Searx follows and solves most of the weaknesses of metasearch engines. Of course it has its own weaknesses as well, but most of these are not conceptual and can be resolved by improving the software (and datasets)

I've been using it for a few months and as my local index is growing I can avoid relying on external search engines - and even websites listed in results - more and more frequently.

The initial reception is overwhelmingly positive with already more than 30 contributors and hundreds of contributions. Currently it can help with "recall" type searches mainly, but I'm planning to provide pre-indexed thematic datasets and I'm drafting a peer-to-peer index sharing concept. Maybe you can find it useful as well (or at least have some constructive criticism =]).

Links: - https://hister.org/ - https://github.com/asciimoo/hister - Background/motivation/beginnings: https://hister.org/posts/how-i-cut-my-google-search-dependen... - Small read-only demo: https://demo.hister.org/

This looks very promising. Thank you for investing time in this.

Assuming it indexes everything locally and falls back to traditional search engines if none found, how do you feel about adding a shared middle layer? A layer that simply indexes all the canonical data that doesn't have any personal info. This way, the contributors can automatically contribute the pages they index - building a shared search engine over time! The whole thing can work without a crawler of its own (under appropriate license so people can trust it)

loading story #48267657
This is great news!

Hister sounds like a idea I had years ago but gave up on after running into issues with index size taking up way too much storage.

Long ago I've used Searx and really liked it but after some point didn't see the point as opposed to using Google more directly. But lately in the back of my mind I've thinking about checking in on it again.

Have been using hister for a while now and have found it super useful! There are so many times I find myself trying to remember a website I looked at a couple months ago and can't find it again via a regular search. Hister has saved me there already multiple times.

The only feedback I have is the initial indexing from my large history was rough. There were a lot of domains that kept blocking me for exceeding rate limiting or wouldn't let me index at all. I could see it being useful to import a history file and organize it by domain inside some sort of temporary database to track/distribute attempts and get a more detailed report on complete domain failures.

Regardless though - great work!

loading story #48268359
I use my own domain index to navigate the web.

- If I wanted to use use my domain list to start hister, to download my preconfigured / like domains?

- Can I make some pages to rank higher in it?

- Can I assign tags to pages (by which I could later on filter?)

My domain index

- https://github.com/rumca-js/Internet-Places-Database

loading story #48267374
It's great seeing some more varied takes on search engines like this. That's essentially the same reason I use inoreaders rss search to find articles when I want to revisit them etc and it has been super handy. I know there have been some projects focused on rss search engines like OpenOrb that have some similarities to Hister. Makes me wonder if Hister could seed its history using rss.
loading story #48267041
Thank you for making Searx, I use it as the web search tool MCP for local models and it works very well, so that not only big companies have the power to show or hide results now.
I'm on a self-hosted seaxng. It's great. I just need a computer up 24h/7d.

A VPS with without a black listed IP is good. A simple rootless container, update is easy.

Configuration takes little time, not much.

I still hate that I have to double the bang to use the same bang as DDG.

Example: "!!wde Ente" to go to the German wikipedia page about duck instead of "!wde Ente" with DDG.

Happy hister user here. Thank you!