You can't make attention more specialized without making it less general, which makes LLMs worse as a universal approximator.