Voices aren't strong.
There just aren't that many unique characteristic parameters behind a voice - it's largely dictated by an evolutionary shared shared larynx and vocal tract. They aren't fingerprints.
The fact that human voice impersonation is not only widely possible but popular should give you an indication of this. Prosody, intonation, range, etc. - it's all flexible and can be learned and duplicated.
The signals are simple too, because we have to encode and decode them quickly. You may or may not be able to picture and rotate an apple tree in your head, but you can easily read this sentence in the voice of David Attenborough.
Moreover, you can easily fine tune a voice model to fit any other speaker. You can store the unique speaker embeddings in a very thin layer. Zero and few shot unseen sampling can even come close to full reproduction. You can measure this all quantitatively.
Voices are not, and never have been, fingerprints. They're just not that unique.