very cool idea. But, time savings are not true for every tool call, and it's not clear to me yet whether this is batch-able; also, intuitively, for most of the models that run on GPU, you'd still want to offload tool exec part to CPU since it's much cheaper...
loading story #47363274