> Our findings reveal a phenomenon of constraint decay: as structural requirements accumulate, agent performance exhibits a substantial decline.
I have exactly the inverse findings on my end. The bigger and more legacy the codebase, the more accurate the patches become.
The harness itself seems to be the most important part. I use a recursive loop that primes the root context based on the user prompt each time. My agent will often make over 100 tool calls to sql and git before it finally decides to apply a patch. If I was greenfield, there would be nothing to query or constrain against.
I find the same. We have abstractions with multiple concrete implementations, examples of patterns and examples of anti patterns.
I usually find I can achieve 90% of the outcome I'm trying to achieve. I use sonnet for planning, qwen for coding, sonnet for review.
loading story #48258595