Friday, October 31, 2008

Fixing the Java Memory Model

In this article(part1, part2) published on the developWorks, Brian Goetz talked about the problems with Java Memory Model and the solutions proposed in JSR 133. The article helped me better understand an MSDN article I bloged about before. In the 'Thin Event' section of the MSDN article, Joe Duffy used Thread.MemoryBarrier method as follows.

private int m_state; // 0 means unset, 1 means set.
private EventWaitHandle m_eventObj;
private const int s_spinCount = 4000;

public void Set() {
m_state = 1;
Thread.MemoryBarrier(); // required.
if (m_eventObj != null) m_eventObj.Set();
}
Before I came accross Goetz's article, I didn't quite understand why it's required, though Duffy mentioned that "a legal transformation" in CLR 2.0 memory model necessitates the call to Thread.MemoryBarrier method. Now, I know why Duffy didn't explain the requirement, for it takes another article to explain it clearly. Even though Goetz's article is on JVM, the concepts applies to CLR. To understand why its' required, please read the following summary.

As you may have known, the sequence of compiled code that gets executed in the CPU may not be the same as the sequence of source code, because the compiler, runtime, processor or cache may move compiled code around for performance reason. The optimizations are quite prevalent in a uniprocessor system, but weird things can happen in a multprocessor system. Therefore, rules are needed to specify how a program access variables in memory to avoid the problem. A memory model is a collection of such rules. The Java Memory Model(JMM) is defined in Chapter 17 of the Java Language Specification. It defines the semantics of synchronized, final and volatile.

The synchronized keyword ensures that only one thread can enter the synchronized block protected by a given monitor. JMM also specifies memory visibility rules for code in the synchronized block. According to JMM, caches are flushed when exiting a synchronized block and invalidated when entering one, and the compiler does not move instructions from inside a synchronized block to outside.

However, the original JMM exposed 2 problems. The first problem was immutable objects as decorated by final keyward might not be immutable. For example, in Sun 1.4 JDK, there are 3 important final fields: a reference to a character arrray, a length, and an offset into the character. Take a look at the following snippet, where 2 string objects are constructed outside synchronization block. Under the original JMM, due to the way object initialization works in Java, for code using s2, it might see '/user' for one moment and then '/temp'. That is, a final object is not final at all.
String s1 = "/user/tmp";
String s2 = s1.substring(4);
Another problem was associated with volatile fields. The original JMM required that 1) volatile reads and writes are to go directly to main memory, prohibiting caching values in registers and bypassing processor-specific caches. 2)the compiler or cache cannot reorder volatile reads and writes with each other. The problem came with what is not required in original JMM. the JMM did allow ordinary variable reads and writes to be reordered with respect to volatile reads and writes. In the following code, thread A and thread B are coordinated by initialized volatile variable, without using synchronized block. So, in the original JMM, the write to initialized variable in thread A is allowed to be reordered above the assignment to configOption variable. That makes the result of using configOptions in thread B unknown.
Map configOptions;
char[] configText;
volatile boolean initialized = false;
...

//In thread A
configOptions = new HashMap();
configText = readConfigFile(fileName);
processConfigOptions(configText, configOptions);
initialized = true;

//In thread B
while(!initialized) sleep();
//use configOptions
the JSR 133 Expert Group decided that it it makes sense for volatile reads and writes not to be reorderable with any other memory operations. The new JMM defines an ordering call happens-before, which is a partial ordering of all actions within a program. Under the new JMM, when thread A writes to a volatile varialbe V and thread B reads from V, any variable values that were visible to A at the time that V was written are guaranteed now to be visible to B. Although the guarantee imposes higher performance penalty for accessing volatile fields, it solves the problem mentioned above.

As for the final object problem, JMM provides a new guarantee of initialization safety, that is, a reference to the object is not published before the constructor has completed. As long as an object is constructed in this manner, all threads will see the values for its final fields that were set in its constructor, regardless of whether or not synchronization is used to pass the reference from one thread to another. Further, writes that initialize final fields will not be reordered with other operations in the constructor of the final object.

Further reading: The Java Memory Model

Wednesday, October 29, 2008

NLS_LANG in Oracle

Everyone working on Oracle in non English(American) environment should definitely take a look at the NLS_LANG faq. It contains many fundamental concepts one should grasp to work effectively with Oracle.

So what is NLS_LANG? According to the faq, "It sets the language and territory used by the client application and the database server. It also indicates the client's character set, which corresponds to the character set for data to be entered or displayed by a client program." Language component "specifies conventions such as the language used for Oracle messages, sorting, day names, and month names". Territory component "specifies conventions such as the default date, monetary, and numeric formats". Charset component "specifies the character set used by the client application".

The NLS_LANG setting has the following format, language_territory.charset and can be set at the client in Windows Registry (HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx\ for Oracle Database versions 8, 8i and 9i) or as System or User Environment Variable. The setting I use on my client machine is TRADITIONAL CHINESE_TAIWAN.ZHT16MSWIN950. One can also use @.[%NLS_LANG%]. command to display the setting in SQL Plus. If the NLS_LANG is not set, Oracle assumes that the NLS_LANG at the client is AMERICAN_AMERICA.US7ASCII and do locale-specific translation accordingly.  So if you can't read the text selected from the database, it's very likely the character set at the client is different from that at the Oracle server, or Oracle Installer doesn't populate NLS_LANG and use the default US7ASCII.

On the server, NLS_LANG can be set as an session parameter, instance parameter, or database parameter. Former overrides latter if set, and former inherits from latter if not. To display the settings on the server, one can execute the following commands:

  1. SELECT * from NLS_SESSION_PARAMETERS;
  2. SELECT * from NLS_INSTANCE_PARAMETERS;
  3. SELECT * from NLS_DATABASE_PARAMETERS;

The setting on the server is more fine-grained than it on the client. To change session or instance parameters, use ALTER SESSION or ALTER SYSTEM command. For database parameters, it is set via init.ora file during database creation and can't be changed after that. There is no NLS_LANG but NLS_LANGUAGE and NLS_TERRITORY in init.ora. Also, the database character set is defined by the "CREATE DATABASE" command and can't be changed afterwards.

If the character set is the same at the client and the server, Oracle directly stores whatever is submitted by the client. No conversion is involved. If the character set defined at the client is different from that at the server, the conversion is usually done at the client. However, the conversion may fail. For example, a database created with NLS_LANG=TRADITIONAL CHINESE_TAIWAN.WE8MSWIN1215 can't store Chinese(Traditional) because WE8MSWIN1215 doesn't support Chinese, but a database with NLS_LANG=AMERICAN_AMERICA.UTF8 can store Chinese(Traditional), if the input text is encoded in ZHT16MSWIN950 or UTF8. So if the database character set can't support the character set submitted by the client, the database has to be recreated.

To troubleshoot the character set conversion problem, there are several places to look after.

  1. Database character set
  2. NLS_LANG setting at the server machine
  3. NLS_LANG setting at the client machine

To see the encoding used by Oracle to store text, use the DUMP command. The following is the result from the Oracle I tested.

SQL> SELECT DUMP('abc', 1016) FROM DUAL;

DUMP('ABC',1016)
------------------------------------------------------------------

Typ=96 Len=3 CharacterSet=ZHT16BIG5: 61,62,63

SQL>

Sunday, October 26, 2008

High Availability: Keep Your Code Running with the Reliability Features of the .NET Framework

SQL Server 2005 allows programmers to write stored procedures in C#, which means a CLR runtime is hosted in SQL Server 2005. However, if the stored procedures aren't written correctly so as to jeopardize its hosting environment, this feature could become a nightmare for SQL Server product team. In this MSDN article, Stephen Toub talked about many features introduced in .NET Framework 2.0 to meet the high availability requirement in a Hosting environment.

What 'vilainies' are there that would make our program unreliable? According to the author, they come in the form of OutOfMemoryException, StackOverflowException and ThreadAbortException.

Many operations in .NET would result in memory allocation. The obvious one is object construction. there are other not obvious ones. For example, boxing requires heap allocation to store a value type. Invoking a method of an assembly for the first time also results in the assembly being delay-loaded into memory. The first time a method is just-in-time compiled require memory allocations to store the generated code and associated runtime data structures. If any of these operations goes wrong and causes OutOfMemoryException being thrown, in .NET Framework 1.x, it's possible the code in the catch block and/or finally block is not executed.

StackOverflowException could be thrown in the managed code or within the runtime. For the exception thrown within the runtime, the process will be torn down. For the exception thrown in the managed code, it's good that one catches and handle the exception but if one is not careful enough, another StackOverflowException could be thrown in the catch block, which triggers the OS to kill the process.

ThreadAbortException could occur when Thread.Abort or AppDomain.Unload method is called. In .NET Framework 1.x, resource leak is still possible with or without catch/finally block.

The .NET Framework 2.0 introduces Constrained Execution Regions(CER) to deal with the asynchronous exceptions mentioned above. For code marked as a CER, the runtime will delay thread aborts for code that is executing in a CER. Also, the runtime will prepare CERs as soon as possible to avoid out-of-memory conditions. That is, the runtime will allocate memory up front to avoid OutOfMemory/StackOverflow Exceptions in the CERs.

To mark code as a CER in .NET Framework 2.0, call RuntimeHelpers.PrepareConstrainedRegions() method before entering try {...} finally {...} block. Also, the methods called in the block has to adhere to the the constraints required for execution within a CER. One expresses that the methods meet the constraints through ReliabilityContractAttribute.

A Reliability Contract express two concepts: what kind of state corruption could result from asynchronous exceptions being thrown during the method's execution, and, given valid input, what kind of completion guarantees the method can make if it is invoked in a CER and asynchronous exception are thrown. Because ThreadAbortException is being delayed in a CER, one needs to consider the other two failures: OutofMemoryException and StackOverflowException. Only the following three Reliability Contracts are valid for methods in a CER.

[ReliabilityContract(Consistency.MayCorruptInstance, Cer.MayFail)]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]

If the code in a CER calls an interface method, a virtual method, delegate or generic method, CLR needs further information to allocate memory up front. Developers can help by calling several methods defined in the RuntimeHelpers class, i.e., PrepareMethod and PrepareDelegate.

When StackOverflowException is thrown in the try block, CLR doesn't guarantee the back-out code is executed. To be sure that back-out code must absolutely execute under StackOverflowExceptions, the RuntimeHelpers class provides ExecutecodeWithGuaranteedCleanup method.

The description of the Thread.Abort() method on MSDN, reads as follows, "Raises a ThreadAbortException in the thread on which it is invoked, to begin the process of terminating the thread. Calling this method usually terminates the thread.". What actions are involved in the termination process? When does a thread can't be terminated by Abort() method, and what can we do about it?

When a thread is executing in a try-catch-finally block and get aborted, the CLR executes the finally block before terminating the thread. If the code catch ThreadAbortException or enters in a infinite loop in the finally block, that thread can't be terminated. Under this situation, a CLR Host in .NET framework 2.0 can abort the thread abruptly, i.e., rude thread abort. For 'graceful' thread aborts, CLR delay the action by default over CERs, finally blocks, catch blocks, static constractors and unmanaged code. For rude thread aborts, CLR delays the action only over CERs and unmanaged code.

Since rude thread aborts skip over finally and catch blocks, resources leak may result. To account for this problem, .NET Framework 2.0 introduces a new kind of finalizer, CriticalFinalizerObject. If one is concerned that important resources might leak because of rude thread abort, one can wrap the resource in a class derived from CriticalFinalizerObject. The resource is guaranteed to be released. SafeHandle is such an example that makes use of CriticalFinalizerObject. It is simply a managed wrapper around an IntPtr with a finalizer that knows how to release resource referenced by that IntPtr.

This article also mentioned several other methods introduced in .NET Framework 2.0 to make our code more reliable, such as,

  1. Thread.BeginCriticalRegion/Thread.EndCriticalRegion
  2. Environment.FailFast
  3. Runtime.MemoryFailPoint

It seems to me that how reliabe you want your application to be, .NET Framework can just provide the tools you need to make it that reliable. It's up to you to define the reliability of your application.

Tuesday, October 21, 2008

Object Role Stereotypes

In this article published in MSDN August, Jeremy Miller talked about Responsibility-Driven Design(RDD) and used some examples to demonstrate how we may apply RDD in designing objects.

RDD is closely related to CRC(Class/Responsiblity/Collaborator) card, a modeling tool for designing software. The RDD design principle starts with defining class's role in a program, list its responsibilities to fulfill the role and the interactions with other classes to accomplish it's responsibilities.

RDD categorizes 6 stereotypes in a program.

  1. Information Holder: Knows things and provides information. May make calculations from the data that it holds.
  2. Structurer: Knows the relationships between other objects.
  3. Controller: Controls and directs the actions of other objects. Decides what other objects should do.
  4. Coordinator: Reacts to events and relays the events to other objects.
  5. Service Provider: Does a service for other objects upon request.
  6. Interfacer: Objects that provide a means to communicate with other parts of the system, external systems or infrastructure, or end users.

Though I don't know RDD before, but it seems to me that these stereotypes have covered most of the objects in my programs. That is, I have been applying RDD tacitly in my programs already. For example, a data access object in my code is an Information Holder or a Structure. The facade pattern in my code is an interfacer. In my program, Controllers delegate requests from clients to Service Providers. An Event Handler in C# is equivalent to a Coordinator in RDD.

Good naming can be of great help when designing and maintaining software. RDD provides guidelines for me to group my objects. If a team has RDD in mind, it's easy to communicate software design among team members. If coders and maintainers share the same mindset, source code is pretty much self-documented. It's good to have such a tool in a programmer's toolbox.

Maximize Locality, Minimize Contention

In this article published on Dr. Dobbs's Journal, Herb Sutter reminded us that spacial locality could inhibit software scalebility if we don't code our program carefully.

As depicted in the graph, memory is not accessed in bytes but in chunks. In the cache, data are accessed in terms of cache lines by the hardware. In the RAM, data are accessed in terms of pages by the OS. In the disk, data are accessed in terms of clusters. So any contension for the memory would definitely impact software performance.

For example, take a look at the following sample code.

// Thread 1
for(int i = 0; i < MAX; ++i ) {
++x;
}

// Thread 2
for(int i = 0; i < MAX; ++i ) {
++y;
}

If x and y are defined close together to fit into a cache line, due to the cache coherency protocol, only one thread can update the cache line at a time and the resulting behivor would like the code below. Originally, one would expect single-thread code to run twice as fast but only to find that the actual performance is not that much, because the cache line containing variable x and y becomes the hot spot.

// Thread 1
for(int i = 0; i < MAX; ++i ) {
lightweightMutexForXandY.lock();
++x;
lightweightMutexForXandY.unlock();
}

// Thread 2
for(int i = 0; i < MAX; ++i ) {
lightweightMutexForXandY.lock();
++y;
lightweightMutexForXandY.unlock();
}

So to avoid the convoy phenomenon, the author suggested several guidelines to follow.

  1. Keep data that are not used together apart in memory to avoid convoy phenomenon.
  2. Keep data that is frequently used together close together in memory to take advantage of locality.
  3. Keep "hot"(frequently accessed) and "cold"(infrrequently accessed) data apart.

Further readings:

  1. Windows with C++: Exploring High-Performance Algorithms
  2. .NET Matters: False Sharing

Tuesday, September 30, 2008

Unhandled Exception Processing In The CLR

In this article published in MSDN September, the author categorized 3 situations where an exception would occur in the CLR and what would happen if not catched in the CLR.

  1. Exceptions in managed code
  2. Exceptions in unmanaged c++ coded which is invoked via P/Invoke in managed code
  3. Exceptions in managed code which is called via CLR Hosting API or COM Interop in native code

For situation 1 and 2, when a managed exception is not catched, the exception is swallowed by the CLR, if it happens in a thread created using System.Threading.Thread class, in the Finalizer thread or in the CLR thread pool threads. Otherwise, UnhandledException event is raised as part of the CLR's unhandled exception processing, which terminates the process. We can register an event handler to log about the failure for later diagnosis.

For situation 3, if an exception is thrown in a thread created in the CLR, in the .NET Framework 1.0 and 1.1, the CLR swallows the exception, while in the .NET Framework 2.0 and later, the CLR lets it go unhandled after triggering its unhandled exception process. On the other hand, if that exception is thrown on the thread created outside the CLR, the exception is wrapped as SEH Exception, and propagated backed to the native code.

Managed exceptions, if rethrown or packaged as inner exceptions as the following, the CLR would reset the starting point of the exception. The implication is that you may observe the point of failure is changed to that line in your log.

try
{
     ...
}
catch(FileNotFoundException e)
{
     ...
     throw e;    //or throw new ApplicationException(e);
}

Finally, if an exception occurs in a different AppDomain, this exception would be marshalled-by-value back and thrown in the calling AppDomain.

Further readings:

Monday, September 15, 2008

9 Reusable Parallel Data Structures and Algorithms

As I mentioned before, cpu development has moved into a multi-core stage. To fully utilize the computing power in the chip, concurrency programming is the key. This MSDN article introduced 9 parallel data structures and algorithms which allow one to design a multi-threaded program in a very intuitive way. The best part of this article I found is the author explained why the source code executes correctly and which parts of the code, if changed, could result into a deadlock.

  1. Countdown Latch - This class allows a program to wait for a counter to reach zero. It uses Interlocked.Decrement to substract the counter and EventWaitHandle to coordinate to wait and signal events.
  2. Reusable Spin Wait - Sometimes putting a program to sleep can be expensive due to the cost, i.e., context switch, involved. For a short period of time, i.e., cycles, spin wait is a good choice. This structure instructs a cpu to wait based on the number of processors in a computer. If there is only one cpu, the structure puts the program to sleep. If there is more than one cpu, the structure instructs the cpu to spin.
  3. Barriers - Say, task1 and task2 can be executed in parallel, but task3 can not proceed until both task1 and task2 are completed. One way to code this is to run task2 in thread2 and execute task1 in thread1. After task1 complete, call Join in thread2 to wait for the completion of task2. Or we can use Barriers to wait for something to complete before proceed to the next step.
  4. Blocking Queue - Very straightforward and easy to understand. It's also a good example of how to use Monitor.Pulse / Monitor.Wait in a correct way.
  5. Bounded Buffer - This is a data structure to solve consumer/ producer problem. The class is implemented similar to Block Queue.
  6. Thin Event - Win32 events can be expensive to allocate. This class uses the SpinWait structure to wait first and, if an event is still not signaled, it allocates a Win32 Event to wait on. Lazy evalation is the design philosophy behind this class.
  7. Lock-Free LIFO Stack - You don't have to lock an entire stack to push or pop an item. The author showed that an Interlocked.CompareExchange operation is enough to ensure push or pop operations thread-safe in a stack.
  8. Loop Tiling - If you know what parallel query is, you know what loop tiling is doing. It's easier to show you the codes than describe them in words.

    List<T> list = ...;
    foreach(T e in list) { S; }

    A C# for loop above can be run in parallel using the function below.

    Parallel.ForAll(list, delegate(T e) { S; }, Environment.ProcessorCount);
  9. Parallel Reductions - Some operations, such as, max, min, count, sum, can be performed in parallel. The author provided a Reduce function to simplify the task. The function reminded me of the MapReduce method.