Native code can already be multi-threaded so if you are using Python to drive parallelized native code, there's no win there. If your Python code is the bottleneck, well then you could have subinterpreters with shared buffers and locks. If you really need to have shared objects, do you actually need to mutate them from multiple interpreters? If not, what about exploring language support for frozen objects or proxies?
The only thing that free threading gives you is concurrent mutations to Python objects, which is like, whatever. In all my years of writing Python I have never once found myself thinking "I wish I could mutate the same object from two different threads".
When using something like boost::python or pybind11 to expose your native API in Python, it is not uncommon to have situations where the native API is extensible via inheritance or callbacks (which are easy to represent in these binding tools). Today with the GIL you are effectively forced to choose between exposing the native API parallelism or exposing the native API extensibility; e.g. you can expose a method that performs parallel evaluation of some inputs, OR you can expose a user-provided callback to be run on the output of each evaluation, but you cannot evaluate those inputs and run a user-provided callback in parallel.
The "dumbest" form of this is with logging; people want to redirect whatever logging the native code may perform through whatever they are using for logging in Python, and that essentially creates a Python callback on every native logging call that currently requires a GIL acquire/release.
Could some of this be addressed with various Python-specific workarounds/tools? Probably. But doing so is probably also going to tie the native code much more tightly to problematic/weird Pythonisms (in many cases, the native library in question is an entirely standalone project).
> The only thing that free threading gives you is concurrent mutations to Python objects, which is like, whatever.
The big benefit is that you get concurrency without the overhead of multi-process. Shared memory is always going to be faster than having to serialize for inter-process communication (let alone that not all Python objects are easily serializable).