Author here.
Type systems restrict which programs can be expressed and increasing expressiveness often requires increasing type-system complexity (which, speaking from experience, both humans and agents will struggle with). Plus they are not the only mechanism to assert correctness (they only validate a subset of your program correctness and do not replace tests) and you are still on your own when it comes to actually recovering from unexpected errors (something Erlang/Elixir were designed for).
I'd say there are two flip sides to your question:
1. Given types do not replace tests, if you can use AI to automate full test coverage, are there actual benefits in static typing for coding agents? The downside of tests for humans is that we suck at writing them (but guided agents can do better) and they can take time to run (which agents do not care)
2. Do we actually have any data or evaluations that show which typing discipline is better for agents? The only benchmark I am aware of [AutoCodeBenchmark] has Elixir come first (dynamic) and C# as second (static), so it doesn't answer the question. There are other benchmarks that show dynamic languages require fewer tokens to solve problems (but that's not a metric I particularly care about)
My gut feeling is that local structure, documentation, quality and quantity in the training data, etc are likely to play a more important role than typing for coding agents. I'd also love to measure how agents perform on specific domains. If you are writing concurrent software, how does Elixir/Java/Rust/Go compare? But without data, it's hard to say.
[AutoCodeBenchmark]: https://github.com/Tencent-Hunyuan/AutoCodeBenchmark
Devs have very strong opinions about dynamically typed programming languages. But reasons such as "exploratory programming", "expressiveness", "taste" that makes them feel good to program in for humans does not matter for agents. Agents don't care that the language "limits them" and prevents them from expressing the code in a succint way because it would not type check.
On expressiveness, people often frame it as a dynamic-language goal, but a large portion of type system research is precisely about making type systems more expressive so they can describe a wider range of programs and invariants. This is clearly something both camps value. I suppose another interesting benchmark could be: how do coding agents perform across languages with different degrees of type-system expressiveness?
We may directionally agree, but it is hard to draw conclusions without measurements. Overall, I'd say this is much more of an open question than people give it credit for.
I am actually writing a paper on this right now so nothing I can point you to yet but yes. LLMs are better (produce working code in fewer attempts controlling for the relative size of training corpus) when using type systems with inference and global unification. It is largely about the quality of the error feedback channel so languages with very good compiler errors (accurate, localized, include the correction with the failure) can close a lot of ground.
But inference + sound type system gives you a constraint propagation that genuinely restricts the ability of the LLM to get into trouble. Type systems that require annotation give up most of the benefit, since the annotations are themselves surface area for LLM mistakes. Unification also puts heavy limits on the expressiveness of the language which is a confounder and may actually be a big part of the benefit too.
Everyone has been on the "the training data is better" thing but I actually don't think so. All of the languages that people report as being better because of good training data actually have fairly restrictive type systems. Elixir is an exception, but it has exceptionally good error messages! And also, along with erlang, pretty unique runtime semantics that may contribute but that's outside my domain I'm on type systems. Debunking the training quality thing is not what I'm working on but I have deep suspicions about that common wisdom.