Document poisoning in RAG systems: How attackers corrupt AI's sources
https://aminrj.com/posts/rag-document-poisoning/The PoisonedRAG paper showing 90% success at millions-of-documents scale is the scary part. The vocabulary engineering approach here is basically the embedding equivalent of SEO — you're just optimizing for cosine similarity instead of PageRank. And unlike SEO, there's no ecosystem of detection tools yet.
I'd love to see someone test whether document-level provenance tracking (signing chunks with source metadata and surfacing that to the user) actually helps in practice, or if people just ignore it like they ignore certificate warnings.
this is the entire premise that bothers me here. it requires a bad actor with critical access, it also requires that the final rag output doesn't provide a reference to the referenced result. Seems just like a flawed product at that point.