Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
https://arxiv.org/abs/2605.15184Combining regex filtering with semantic ranking using multi-vector embeddings has yielded good results for me. I use ColGREP from the LightOn team asa daily driver - https://github.com/lightonai/next-plaid/blob/main/colgrep/RE...
loading story #48463853
loading story #48464294
I recently watched the new Palantir + Kirkland & Ellis fund formation platform demo, and I was surprised to see how effective the union of structured data was in an agent harness. We're used to dealing with flat files and comparing here basic ways of searching, essentially, long strings, but using Palantir's "Ontology" graph framework, I think Kirkland is going to be able to achieve some exception and differentiating outcomes in legal tech. The whole idea assumes that they've got great structured data already, and perhaps that's the real valuable unknown, but giving an agent those tools is super powerful.
I wrote about it[1] and came away with a different view on both Palantir and the future of agentic workflows personally.
[1] sorry, LinkedIn: https://www.linkedin.com/pulse/fund-managements-killer-app-d...
This is a surprising result. With structured inputs like source code, I’d expect grep to outperform semantic search, but natural language’s errors and inconsistencies seem to leave so many cracks for information to fall through.
loading story #48463067
loading story #48463676
Tangential, I have a hook that rewriters grep to rg but lately I wonder if this is actually wasteful as the model is so biased to grep, is there a way to shim/alias perhaps?
loading story #48462891
loading story #48462797
loading story #48462655
If you are truly bitter-lesson pilled - give the agent all the tools and let it decide which to use.
- regex (grep) - hybrid search (bm25+vector)
this X vs Y is uninteresting when the answer can be both.
loading story #48463065
loading story #48463311
loading story #48463003
loading story #48464030
I'm curious to see what patterns it's grepping.
Feels important, but I wish they also had compared against something like MeiliSearch or Algolia.
loading story #48462674
Surely 'strings' would be even better?
This has been posted before, but a dead-simple pattern that helps enormously with steering the model to the right code area is a DESIGN.md that it creates, updates, and references periodically.
[flagged]
[dead]