Hacker News new | past | comments | ask | show | jobs | submit
The Postgres-only approach is a really smart call for this scale. I've run pgvector alongside BM25 (via ParadeDB) for internal search at work and it handles mid-size corpora surprisingly well. The operational simplicity of one database vs. managing Elasticsearch + a vector DB + Postgres is a huge win for small teams.

One thing I'd watch out for: HNSW index rebuild times can get painful once you cross ~5M vectors. We ended up doing incremental inserts with a background reindex job. Not a dealbreaker, just something to plan for early.

Also curious how you handle permission syncing. That's usually where self-hosted workplace search gets tricky. Google Drive permissions in particular are a nightmare to mirror accurately.