Story Detail of id 48395152 | Liveview Hacker News

josevalim14 hours ago | on: Elixir v1.20: Now a gradually typed language

> Honest question, in the era of vibe and AI assisted coding is there any advantages of using untyped programming languages, apart from the fact that non-typed languages has more traning data for the LLM?

Author here.

Type systems restrict which programs can be expressed and increasing expressiveness often requires increasing type-system complexity (which, speaking from experience, both humans and agents will struggle with). Plus they are not the only mechanism to assert correctness (they only validate a subset of your program correctness and do not replace tests) and you are still on your own when it comes to actually recovering from unexpected errors (something Erlang/Elixir were designed for).

I'd say there are two flip sides to your question:

1. Given types do not replace tests, if you can use AI to automate full test coverage, are there actual benefits in static typing for coding agents? The downside of tests for humans is that we suck at writing them (but guided agents can do better) and they can take time to run (which agents do not care)

2. Do we actually have any data or evaluations that show which typing discipline is better for agents? The only benchmark I am aware of [AutoCodeBenchmark] has Elixir come first (dynamic) and C# as second (static), so it doesn't answer the question. There are other benchmarks that show dynamic languages require fewer tokens to solve problems (but that's not a metric I particularly care about)

My gut feeling is that local structure, documentation, quality and quantity in the training data, etc are likely to play a more important role than typing for coding agents. I'd also love to measure how agents perform on specific domains. If you are writing concurrent software, how does Elixir/Java/Rust/Go compare? But without data, it's hard to say.

[AutoCodeBenchmark]: https://github.com/Tencent-Hunyuan/AutoCodeBenchmark

loading story #48401903

osener13 hours ago | parent | next

In my experience restricting programs that can be expressed is a good thing, even more so with agentic engineering. The more guardrails there are, strong typing/TDD/computer use/..., the solution space shrinks and chance of a robust solution increases. Sure maybe this burns more tokens going in circles but it feels less like a slot machine more like a robot searching for a solution for a well-defined problem.

Devs have very strong opinions about dynamically typed programming languages. But reasons such as "exploratory programming", "expressiveness", "taste" that makes them feel good to program in for humans does not matter for agents. Agents don't care that the language "limits them" and prevents them from expressing the code in a succint way because it would not type check.

josevalim12 hours ago | root | parent

Agreed on the guardrails bit. My point is that we still don't have much evidence that static types are an effective way to constrain the search space for coding agents, or how much value they add on top of other mechanisms. Redundancy can certainly be beneficial, but how much and at what cost?

On expressiveness, people often frame it as a dynamic-language goal, but a large portion of type system research is precisely about making type systems more expressive so they can describe a wider range of programs and invariants. This is clearly something both camps value. I suppose another interesting benchmark could be: how do coding agents perform across languages with different degrees of type-system expressiveness?

We may directionally agree, but it is hard to draw conclusions without measurements. Overall, I'd say this is much more of an open question than people give it credit for.

loading story #48399341

loading story #48399616

giraffe_lady8 hours ago | parent | next

> 2. Do we actually have any data or evaluations that show which typing discipline is better for agents? The only benchmark I am aware of [AutoCodeBenchmark] has Elixir come first (dynamic) and C# as second (static), so it doesn't answer the question. There are other benchmarks that show dynamic languages require fewer tokens to solve problems (but that's not a metric I particularly care about)

I am actually writing a paper on this right now so nothing I can point you to yet but yes. LLMs are better (produce working code in fewer attempts controlling for the relative size of training corpus) when using type systems with inference and global unification. It is largely about the quality of the error feedback channel so languages with very good compiler errors (accurate, localized, include the correction with the failure) can close a lot of ground.

But inference + sound type system gives you a constraint propagation that genuinely restricts the ability of the LLM to get into trouble. Type systems that require annotation give up most of the benefit, since the annotations are themselves surface area for LLM mistakes. Unification also puts heavy limits on the expressiveness of the language which is a confounder and may actually be a big part of the benefit too.

Everyone has been on the "the training data is better" thing but I actually don't think so. All of the languages that people report as being better because of good training data actually have fairly restrictive type systems. Elixir is an exception, but it has exceptionally good error messages! And also, along with erlang, pretty unique runtime semantics that may contribute but that's outside my domain I'm on type systems. Debunking the training quality thing is not what I'm working on but I have deep suspicions about that common wisdom.

josevalim7 hours ago | root | parent

That’s very exciting! Is there anywhere I could follow you for updates? If you don’t want to share it publicly, and is ok with sharing it privately, my email is my username on gmail. Thank you!!

loading story #48399604

loading story #48400963

#visit	13,567,170
#session	74,665
#live-session	0