Hacker News new | past | comments | ask | show | jobs | submit
I assume that he is referring to "fearless concurrency", the idea that Rust makes it possible to write more complex concurrent programs than other languages because of the safety guarantees:

https://doc.rust-lang.org/book/ch16-00-concurrency.html

So the logic would go:

1. mold doesn't do incremental linking because it is too complex to do it while still being fast (concurrent).

2. Rust makes it possible to write very complex fast (concurrent) programs.

3. A new linker written in Rust can do incremental linking while still being fast (concurrent).

EDIT: I meant this originally, but comments were posted before I added it so I want to be clear that this part is new: (Any of those three could be false; I take no strong position on that. But I believe that this is the motivating logic.)

Actually a lot of the hacks that mold uses to be the fastest linker would be, ironically, harder to reproduce with rust because they’re antithetical to its approach. Eg Mold intentionally eschews used resource collection to speed up execution (it’ll be cleaned up by the os when the process exits) while rust has a strong RAII approach here that would introduce slowdowns.
You can absolutely introduce free-less allocators and the like, as well as use `ManuallyDrop` or `Box::leak`. Rust just asks that you're explicit about it.
Depends on how things are approached.

You could, for example, take advantage of bump arena allocator in rust which would allow the linker to have just 1 alloc/dealloc. Mold is still using more traditional allocators under the covers which won't be as fast as a bump allocator. (Nothing would stop mold from doing the same).

Traditional allocators are fast if you never introduce much fragmentation with free, though you may still get some gaps and have some other overhead and not be quite as fast. But why couldn't you just LD_PRELOAD a malloc for mold that worked as a bump/stack/arena allocator and just ignored free if anything party stuff isn't making that many allocations?
> Traditional allocators are fast

Really it's allocators in general. Allocations are perceived as expensive only because they are mostly dependent on the amortized cost of prior deallocations. As an extreme example, even GCs can be fast if you avoid deallocation because most typically have a long-lived object heap that rarely gets collected - so if you keep things around that can be long-lived (pooling) their cost mostly goes away.

Slight disagreement here.

Allocation is perceived as slow because it is. Getting memory from the OS is somewhat expensive because a page of memory needs to be allocated and stored off. Getting memory from traditional allocators is expensive because freespace needs to be tracked. When you say "I need 5 bytes" the allocator needs to find 5 free bytes to give back to you.

Bump allocators are fast because the operation of "I need 5 bytes" is incrementing the allocation pointer forward by 5 bytes and maybe doing a new page allocation if that's exhausted.

GC allocators are fast because they are generally bump allocators! The only difference is that when exhaustion happens the GC says "I need to run a GC".

Traditional allocators are a bit slower because they are typically something like an arena with skiplists used to find free space. When you free up memory, that skiplist needs to be updated.

But further, unlike bump and GC allocators another fault of traditional allocators is they have a tendency to scatter memory which has a negative impact on CPU cache performance. With the assumption that related memory tends to be allocated at the same time, GCs and bump allocators will colocate memory. But, because of the skiplist, traditional allocators will scattershot allocations to avoid free memory fragmentation.

All this said, for most apps this doesn't matter a whole lot. However, if you are doing a CPU/memory intense operation then this is stuff to know.

> Getting memory from traditional allocators is expensive because freespace needs to be tracked.

If you've never deallocated then there is no free space to track, hence the cost of allocation is _mostly_ (per my above comment) affected by the amortized cost of deallocations. The cost becomes extremely apparent with GCs because a failed allocation is usually what triggers a collection (and subsequently any deallocations that need to happen).

Still, your comment goes into detail that I probably should have.

Nothing about Rust requires the use of the heap or RAII.

Also, if wild is indeed faster than mold even without incrementalism, as the benchmarks show, then it seems quite silly to go around making the argument that it's harder to write a fast linker in Rust. It's apparently not that hard.

I mean, that's pretty easy to do in Rust: https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html

Also see various arena allocator crates, etc.

loading story #42815101
loading story #42814984