Tuesday, October 21, 2008

Maximize Locality, Minimize Contention

In this article published on Dr. Dobbs's Journal, Herb Sutter reminded us that spacial locality could inhibit software scalebility if we don't code our program carefully.

As depicted in the graph, memory is not accessed in bytes but in chunks. In the cache, data are accessed in terms of cache lines by the hardware. In the RAM, data are accessed in terms of pages by the OS. In the disk, data are accessed in terms of clusters. So any contension for the memory would definitely impact software performance.

For example, take a look at the following sample code.

// Thread 1
for(int i = 0; i < MAX; ++i ) {
++x;
}

// Thread 2
for(int i = 0; i < MAX; ++i ) {
++y;
}

If x and y are defined close together to fit into a cache line, due to the cache coherency protocol, only one thread can update the cache line at a time and the resulting behivor would like the code below. Originally, one would expect single-thread code to run twice as fast but only to find that the actual performance is not that much, because the cache line containing variable x and y becomes the hot spot.

// Thread 1
for(int i = 0; i < MAX; ++i ) {
lightweightMutexForXandY.lock();
++x;
lightweightMutexForXandY.unlock();
}

// Thread 2
for(int i = 0; i < MAX; ++i ) {
lightweightMutexForXandY.lock();
++y;
lightweightMutexForXandY.unlock();
}

So to avoid the convoy phenomenon, the author suggested several guidelines to follow.

  1. Keep data that are not used together apart in memory to avoid convoy phenomenon.
  2. Keep data that is frequently used together close together in memory to take advantage of locality.
  3. Keep "hot"(frequently accessed) and "cold"(infrrequently accessed) data apart.

Further readings:

  1. Windows with C++: Exploring High-Performance Algorithms
  2. .NET Matters: False Sharing

No comments: