Hacker News new | past | comments | ask | show | jobs | submit

Wild – A fast linker for Linux

https://github.com/davidlattimore/wild
loading story #42815005
loading story #42814960
loading story #42815776
loading story #42814963
loading story #42816737
loading story #42817472
loading story #42816069
loading story #42817318
loading story #42817059
loading story #42817921
loading story #42816346
loading story #42817138
loading story #42822993
loading story #42818212
loading story #42816049
loading story #42821481
loading story #42814926
> Mold is already very fast, however it doesn't do incremental linking and the author has stated that they don't intend to. Wild doesn't do incremental linking yet, but that is the end-goal. By writing Wild in Rust, it's hoped that the complexity of incremental linking will be achievable.

Can someone explain what is so special about Rust for this?

I assume that he is referring to "fearless concurrency", the idea that Rust makes it possible to write more complex concurrent programs than other languages because of the safety guarantees:

https://doc.rust-lang.org/book/ch16-00-concurrency.html

So the logic would go:

1. mold doesn't do incremental linking because it is too complex to do it while still being fast (concurrent).

2. Rust makes it possible to write very complex fast (concurrent) programs.

3. A new linker written in Rust can do incremental linking while still being fast (concurrent).

EDIT: I meant this originally, but comments were posted before I added it so I want to be clear that this part is new: (Any of those three could be false; I take no strong position on that. But I believe that this is the motivating logic.)

Actually a lot of the hacks that mold uses to be the fastest linker would be, ironically, harder to reproduce with rust because they’re antithetical to its approach. Eg Mold intentionally eschews used resource collection to speed up execution (it’ll be cleaned up by the os when the process exits) while rust has a strong RAII approach here that would introduce slowdowns.
You can absolutely introduce free-less allocators and the like, as well as use `ManuallyDrop` or `Box::leak`. Rust just asks that you're explicit about it.
Depends on how things are approached.

You could, for example, take advantage of bump arena allocator in rust which would allow the linker to have just 1 alloc/dealloc. Mold is still using more traditional allocators under the covers which won't be as fast as a bump allocator. (Nothing would stop mold from doing the same).

Traditional allocators are fast if you never introduce much fragmentation with free, though you may still get some gaps and have some other overhead and not be quite as fast. But why couldn't you just LD_PRELOAD a malloc for mold that worked as a bump/stack/arena allocator and just ignored free if anything party stuff isn't making that many allocations?
> Traditional allocators are fast

Really it's allocators in general. Allocations are perceived as expensive only because they are mostly dependent on the amortized cost of prior deallocations. As an extreme example, even GCs can be fast if you avoid deallocation because most typically have a long-lived object heap that rarely gets collected - so if you keep things around that can be long-lived (pooling) their cost mostly goes away.

Slight disagreement here.

Allocation is perceived as slow because it is. Getting memory from the OS is somewhat expensive because a page of memory needs to be allocated and stored off. Getting memory from traditional allocators is expensive because freespace needs to be tracked. When you say "I need 5 bytes" the allocator needs to find 5 free bytes to give back to you.

Bump allocators are fast because the operation of "I need 5 bytes" is incrementing the allocation pointer forward by 5 bytes and maybe doing a new page allocation if that's exhausted.

GC allocators are fast because they are generally bump allocators! The only difference is that when exhaustion happens the GC says "I need to run a GC".

Traditional allocators are a bit slower because they are typically something like an arena with skiplists used to find free space. When you free up memory, that skiplist needs to be updated.

But further, unlike bump and GC allocators another fault of traditional allocators is they have a tendency to scatter memory which has a negative impact on CPU cache performance. With the assumption that related memory tends to be allocated at the same time, GCs and bump allocators will colocate memory. But, because of the skiplist, traditional allocators will scattershot allocations to avoid free memory fragmentation.

All this said, for most apps this doesn't matter a whole lot. However, if you are doing a CPU/memory intense operation then this is stuff to know.

> Getting memory from traditional allocators is expensive because freespace needs to be tracked.

If you've never deallocated then there is no free space to track, hence the cost of allocation is _mostly_ (per my above comment) affected by the amortized cost of deallocations. The cost becomes extremely apparent with GCs because a failed allocation is usually what triggers a collection (and subsequently any deallocations that need to happen).

Still, your comment goes into detail that I probably should have.

Nothing about Rust requires the use of the heap or RAII.

Also, if wild is indeed faster than mold even without incrementalism, as the benchmarks show, then it seems quite silly to go around making the argument that it's harder to write a fast linker in Rust. It's apparently not that hard.

I mean, that's pretty easy to do in Rust: https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html

Also see various arena allocator crates, etc.

Not really. You would have to either wrap any standard library types in newtypes with ManuallyDrop implemented or (for some) use a custom allocator. And if you want to free some things in one go but not others that gets much harder, especially when you look at how easy a language like zig makes it.

And if you intentionally leak everything it is onerous to get the borrow checker to realize that unless you use a leaked box for all declaration/allocations, which introduces both friction and performance regressions (due to memory access patterns) because the use of custom allocators doesn’t factor into lifetime analysis.

(Spoken as a die-hard rust dev that still thinks it’s the better language than zig for most everything.)

> You would have to either wrap any standard library types in newtypes with ManuallyDrop implemented

ManuallyDrop would presumably be implemented on large data structures where it matters, not on every single type involved in the program.

Both mold and lld are already very heavily concurrent. There is no fear at all there.
That’s puzzling to me too. Rust is a great language, and probably makes developing Wild faster. But the complexity of incremental linking doesn’t stem from the linker’s implementation language. It stems from all the tracking, reserved spacing, and other issues required to link a previously linked binary (or at least parts of it) a second time.
loading story #42815923
loading story #42815683
There are two main factors:

1. Rust's well designed type system and borrow checker makes writing code that works just easier. It has the "if it compiles it works" property (not unique to Rust; people say this about e.g. Haskell too).

2. Rust's type system - especially its trait system can be used to enforce safety constraints statically. The obvious one is the Send and Sync traits for thread safety, but there are others, e.g. the Fuchsia network code statically guarantees deadlocks are impossible.

Mold is written in C++ which is extremely error prone in comparison.

I assume they're referring to thread-safety and the ability to more aggressively parallelize.
loading story #42815000
I went looking for some writing by the author about how he made wild fast, but couldn't find much: https://davidlattimore.github.io/
Rust has a pretty good incremental caching compiler that makes debug builds relatively fast.

Linking is often a very notable bottleneck for debug binaries and mold can make a big difference.

So interest in speeding up linking for Rust is expected.

Apart from what others said, maybe he plans to use Salsa or something like that. Rust has a few popular libraries for doing this.
It's feasible to write complex correct programs with optimal performance in Rust, unlike any other programming language (complex+correct is not feasible in C/C++/assembly/Zig/etc., optimal performance not possible in any other language).
{"deleted":true,"id":42815091,"parent":42814864,"time":1737738129,"type":"comment"}
[flagged]
loading story #42814943
That is baffling. Maybe the author assumes that a language with many safeguards will lead to keeping complexity under control for a difficult task.

By the way I had to lookup what incremental linking is, in practice I think it means that code from libraries and modules that have not changed won’t need to be re-packed each time which ch will save time for frequent development builds, it’s actually ingenious

loading story #42817982
loading story #42815156
loading story #42815117
loading story #42814823