jemalloc 3 performance vs. mozjemalloc
Mike Hommey
mh at glandium.org
Mon Feb 9 22:00:42 PST 2015
On Tue, Feb 10, 2015 at 12:53:57AM -0500, Bradley C. Kuszmaul wrote:
> Lock instructions on modern x86 processors aren't really that expensive.
> What is expensive is lock contention. When I've measured something code
> that does this in a bunch of concurrent threads:
> 1. acquire_lock()
> 2. do_something_really_small_on_thread_local_data()
> 3. release_lock()
>
> It costs about 1ns to do step 2 with no locks.
> It costs about 5ns to acquire the lock if the lock is thread-local, and
> thus not actually contended.
> It costs about 100ns-200ns if the lock is actually contended.
>
> I've found that these measurements have changed the way I write lock-based
> code. For example, I like per-core data structures that need a lock,
> because the per-core lock is almost always uncontended. (The difference
> between per-core and per-thread shows up only when a thread is preempted.)
... except I'm talking about arm and arm has very different performance
properties.
Mike
More information about the jemalloc-discuss
mailing list