After getting it in my hands, it's the same. At least 4 times slower for similar basic Siri responses. My guess is they are doing less local and more server-side generation to start as the on-device models might not be good enough yet.
15 years ago they had the balls to run Siri live on stage: https://youtu.be/6rL9EL2LlrA?is=5yMQxs0C2VAC5Lwz
The responses came in very fast though, so I’m sceptical that the latency is representative (or that they didn’t cherry pick results, but they looked LLM generated). We shall see though.
I’m writing AI apps these days, and even pulling Gemini 3.5 flash on Google Cloud takes longer to get a multi-step response.
Obviously the video is not representative, and there are fast models on fast hardware. But if this takes 2 minutes it’s not very compelling to users.