Ken Wu's Blog: 2008

Friday, October 31, 2008

Fixing the Java Memory Model

In this article(part1, part2) published on the developWorks, Brian Goetz talked about the problems with Java Memory Model and the solutions proposed in JSR 133. The article helped me better understand an MSDN article I bloged about before. In the 'Thin Event' section of the MSDN article, Joe Duffy used Thread.MemoryBarrier method as follows.

private int m_state; // 0 means unset, 1 means set.
private EventWaitHandle m_eventObj;
private const int s_spinCount = 4000;

public void Set() {
m_state = 1;
Thread.MemoryBarrier(); // required.
if (m_eventObj != null) m_eventObj.Set();
}

Before I came accross Goetz's article, I didn't quite understand why it's required, though Duffy mentioned that "a legal transformation" in CLR 2.0 memory model necessitates the call to Thread.MemoryBarrier method. Now, I know why Duffy didn't explain the requirement, for it takes another article to explain it clearly. Even though Goetz's article is on JVM, the concepts applies to CLR. To understand why its' required, please read the following summary.

As you may have known, the sequence of compiled code that gets executed in the CPU may not be the same as the sequence of source code, because the compiler, runtime, processor or cache may move compiled code around for performance reason. The optimizations are quite prevalent in a uniprocessor system, but weird things can happen in a multprocessor system. Therefore, rules are needed to specify how a program access variables in memory to avoid the problem. A memory model is a collection of such rules. The Java Memory Model(JMM) is defined in Chapter 17 of the Java Language Specification. It defines the semantics of synchronized, final and volatile.

The synchronized keyword ensures that only one thread can enter the synchronized block protected by a given monitor. JMM also specifies memory visibility rules for code in the synchronized block. According to JMM, caches are flushed when exiting a synchronized block and invalidated when entering one, and the compiler does not move instructions from inside a synchronized block to outside.

However, the original JMM exposed 2 problems. The first problem was immutable objects as decorated by final keyward might not be immutable. For example, in Sun 1.4 JDK, there are 3 important final fields: a reference to a character arrray, a length, and an offset into the character. Take a look at the following snippet, where 2 string objects are constructed outside synchronization block. Under the original JMM, due to the way object initialization works in Java, for code using s2, it might see '/user' for one moment and then '/temp'. That is, a final object is not final at all.

String s1 = "/user/tmp";
String s2 = s1.substring(4);

Another problem was associated with volatile fields. The original JMM required that 1) volatile reads and writes are to go directly to main memory, prohibiting caching values in registers and bypassing processor-specific caches. 2)the compiler or cache cannot reorder volatile reads and writes with each other. The problem came with what is not required in original JMM. the JMM did allow ordinary variable reads and writes to be reordered with respect to volatile reads and writes. In the following code, thread A and thread B are coordinated by initialized volatile variable, without using synchronized block. So, in the original JMM, the write to initialized variable in thread A is allowed to be reordered above the assignment to configOption variable. That makes the result of using configOptions in thread B unknown.

Map configOptions;
char[] configText;
volatile boolean initialized = false;
...

//In thread A
configOptions = new HashMap();
configText = readConfigFile(fileName);
processConfigOptions(configText, configOptions);
initialized = true;

//In thread B
while(!initialized) sleep();
//use configOptions

the JSR 133 Expert Group decided that it it makes sense for volatile reads and writes not to be reorderable with any other memory operations. The new JMM defines an ordering call happens-before, which is a partial ordering of all actions within a program. Under the new JMM, when thread A writes to a volatile varialbe V and thread B reads from V, any variable values that were visible to A at the time that V was written are guaranteed now to be visible to B. Although the guarantee imposes higher performance penalty for accessing volatile fields, it solves the problem mentioned above.

As for the final object problem, JMM provides a new guarantee of initialization safety, that is, a reference to the object is not published before the constructor has completed. As long as an object is constructed in this manner, all threads will see the values for its final fields that were set in its constructor, regardless of whether or not synchronization is used to pass the reference from one thread to another. Further, writes that initialize final fields will not be reordered with other operations in the constructor of the final object.

Further reading: The Java Memory Model

Wednesday, October 29, 2008

NLS_LANG in Oracle

Everyone working on Oracle in non English(American) environment should definitely take a look at the NLS_LANG faq. It contains many fundamental concepts one should grasp to work effectively with Oracle.

So what is NLS_LANG? According to the faq, "It sets the language and territory used by the client application and the database server. It also indicates the client's character set, which corresponds to the character set for data to be entered or displayed by a client program." Language component "specifies conventions such as the language used for Oracle messages, sorting, day names, and month names". Territory component "specifies conventions such as the default date, monetary, and numeric formats". Charset component "specifies the character set used by the client application".

The NLS_LANG setting has the following format, language_territory.charset and can be set at the client in Windows Registry (HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx\ for Oracle Database versions 8, 8i and 9i) or as System or User Environment Variable. The setting I use on my client machine is TRADITIONAL CHINESE_TAIWAN.ZHT16MSWIN950. One can also use @.[%NLS_LANG%]. command to display the setting in SQL Plus. If the NLS_LANG is not set, Oracle assumes that the NLS_LANG at the client is AMERICAN_AMERICA.US7ASCII and do locale-specific translation accordingly. So if you can't read the text selected from the database, it's very likely the character set at the client is different from that at the Oracle server, or Oracle Installer doesn't populate NLS_LANG and use the default US7ASCII.

On the server, NLS_LANG can be set as an session parameter, instance parameter, or database parameter. Former overrides latter if set, and former inherits from latter if not. To display the settings on the server, one can execute the following commands:

SELECT * from NLS_SESSION_PARAMETERS;
SELECT * from NLS_INSTANCE_PARAMETERS;
SELECT * from NLS_DATABASE_PARAMETERS;

The setting on the server is more fine-grained than it on the client. To change session or instance parameters, use ALTER SESSION or ALTER SYSTEM command. For database parameters, it is set via init.ora file during database creation and can't be changed after that. There is no NLS_LANG but NLS_LANGUAGE and NLS_TERRITORY in init.ora. Also, the database character set is defined by the "CREATE DATABASE" command and can't be changed afterwards.

If the character set is the same at the client and the server, Oracle directly stores whatever is submitted by the client. No conversion is involved. If the character set defined at the client is different from that at the server, the conversion is usually done at the client. However, the conversion may fail. For example, a database created with NLS_LANG=TRADITIONAL CHINESE_TAIWAN.WE8MSWIN1215 can't store Chinese(Traditional) because WE8MSWIN1215 doesn't support Chinese, but a database with NLS_LANG=AMERICAN_AMERICA.UTF8 can store Chinese(Traditional), if the input text is encoded in ZHT16MSWIN950 or UTF8. So if the database character set can't support the character set submitted by the client, the database has to be recreated.

To troubleshoot the character set conversion problem, there are several places to look after.

Database character set
NLS_LANG setting at the server machine
NLS_LANG setting at the client machine

To see the encoding used by Oracle to store text, use the DUMP command. The following is the result from the Oracle I tested.

SQL> SELECT DUMP('abc', 1016) FROM DUAL;

DUMP('ABC',1016)
------------------------------------------------------------------

Typ=96 Len=3 CharacterSet=ZHT16BIG5: 61,62,63

SQL>

Sunday, October 26, 2008

High Availability: Keep Your Code Running with the Reliability Features of the .NET Framework

SQL Server 2005 allows programmers to write stored procedures in C#, which means a CLR runtime is hosted in SQL Server 2005. However, if the stored procedures aren't written correctly so as to jeopardize its hosting environment, this feature could become a nightmare for SQL Server product team. In this MSDN article, Stephen Toub talked about many features introduced in .NET Framework 2.0 to meet the high availability requirement in a Hosting environment.

What 'vilainies' are there that would make our program unreliable? According to the author, they come in the form of OutOfMemoryException, StackOverflowException and ThreadAbortException.

Many operations in .NET would result in memory allocation. The obvious one is object construction. there are other not obvious ones. For example, boxing requires heap allocation to store a value type. Invoking a method of an assembly for the first time also results in the assembly being delay-loaded into memory. The first time a method is just-in-time compiled require memory allocations to store the generated code and associated runtime data structures. If any of these operations goes wrong and causes OutOfMemoryException being thrown, in .NET Framework 1.x, it's possible the code in the catch block and/or finally block is not executed.

StackOverflowException could be thrown in the managed code or within the runtime. For the exception thrown within the runtime, the process will be torn down. For the exception thrown in the managed code, it's good that one catches and handle the exception but if one is not careful enough, another StackOverflowException could be thrown in the catch block, which triggers the OS to kill the process.

ThreadAbortException could occur when Thread.Abort or AppDomain.Unload method is called. In .NET Framework 1.x, resource leak is still possible with or without catch/finally block.

The .NET Framework 2.0 introduces Constrained Execution Regions(CER) to deal with the asynchronous exceptions mentioned above. For code marked as a CER, the runtime will delay thread aborts for code that is executing in a CER. Also, the runtime will prepare CERs as soon as possible to avoid out-of-memory conditions. That is, the runtime will allocate memory up front to avoid OutOfMemory/StackOverflow Exceptions in the CERs.

To mark code as a CER in .NET Framework 2.0, call RuntimeHelpers.PrepareConstrainedRegions() method before entering try {...} finally {...} block. Also, the methods called in the block has to adhere to the the constraints required for execution within a CER. One expresses that the methods meet the constraints through ReliabilityContractAttribute.

A Reliability Contract express two concepts: what kind of state corruption could result from asynchronous exceptions being thrown during the method's execution, and, given valid input, what kind of completion guarantees the method can make if it is invoked in a CER and asynchronous exception are thrown. Because ThreadAbortException is being delayed in a CER, one needs to consider the other two failures: OutofMemoryException and StackOverflowException. Only the following three Reliability Contracts are valid for methods in a CER.

[ReliabilityContract(Consistency.MayCorruptInstance, Cer.MayFail)]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]

If the code in a CER calls an interface method, a virtual method, delegate or generic method, CLR needs further information to allocate memory up front. Developers can help by calling several methods defined in the RuntimeHelpers class, i.e., PrepareMethod and PrepareDelegate.

When StackOverflowException is thrown in the try block, CLR doesn't guarantee the back-out code is executed. To be sure that back-out code must absolutely execute under StackOverflowExceptions, the RuntimeHelpers class provides ExecutecodeWithGuaranteedCleanup method.

The description of the Thread.Abort() method on MSDN, reads as follows, "Raises a ThreadAbortException in the thread on which it is invoked, to begin the process of terminating the thread. Calling this method usually terminates the thread.". What actions are involved in the termination process? When does a thread can't be terminated by Abort() method, and what can we do about it?

When a thread is executing in a try-catch-finally block and get aborted, the CLR executes the finally block before terminating the thread. If the code catch ThreadAbortException or enters in a infinite loop in the finally block, that thread can't be terminated. Under this situation, a CLR Host in .NET framework 2.0 can abort the thread abruptly, i.e., rude thread abort. For 'graceful' thread aborts, CLR delay the action by default over CERs, finally blocks, catch blocks, static constractors and unmanaged code. For rude thread aborts, CLR delays the action only over CERs and unmanaged code.

Since rude thread aborts skip over finally and catch blocks, resources leak may result. To account for this problem, .NET Framework 2.0 introduces a new kind of finalizer, CriticalFinalizerObject. If one is concerned that important resources might leak because of rude thread abort, one can wrap the resource in a class derived from CriticalFinalizerObject. The resource is guaranteed to be released. SafeHandle is such an example that makes use of CriticalFinalizerObject. It is simply a managed wrapper around an IntPtr with a finalizer that knows how to release resource referenced by that IntPtr.

This article also mentioned several other methods introduced in .NET Framework 2.0 to make our code more reliable, such as,

It seems to me that how reliabe you want your application to be, .NET Framework can just provide the tools you need to make it that reliable. It's up to you to define the reliability of your application.

Tuesday, October 21, 2008

Object Role Stereotypes

In this article published in MSDN August, Jeremy Miller talked about Responsibility-Driven Design(RDD) and used some examples to demonstrate how we may apply RDD in designing objects.

RDD is closely related to CRC(Class/Responsiblity/Collaborator) card, a modeling tool for designing software. The RDD design principle starts with defining class's role in a program, list its responsibilities to fulfill the role and the interactions with other classes to accomplish it's responsibilities.

RDD categorizes 6 stereotypes in a program.

Information Holder: Knows things and provides information. May make calculations from the data that it holds.
Structurer: Knows the relationships between other objects.
Controller: Controls and directs the actions of other objects. Decides what other objects should do.
Coordinator: Reacts to events and relays the events to other objects.
Service Provider: Does a service for other objects upon request.
Interfacer: Objects that provide a means to communicate with other parts of the system, external systems or infrastructure, or end users.

Though I don't know RDD before, but it seems to me that these stereotypes have covered most of the objects in my programs. That is, I have been applying RDD tacitly in my programs already. For example, a data access object in my code is an Information Holder or a Structure. The facade pattern in my code is an interfacer. In my program, Controllers delegate requests from clients to Service Providers. An Event Handler in C# is equivalent to a Coordinator in RDD.

Good naming can be of great help when designing and maintaining software. RDD provides guidelines for me to group my objects. If a team has RDD in mind, it's easy to communicate software design among team members. If coders and maintainers share the same mindset, source code is pretty much self-documented. It's good to have such a tool in a programmer's toolbox.

Maximize Locality, Minimize Contention

In this article published on Dr. Dobbs's Journal, Herb Sutter reminded us that spacial locality could inhibit software scalebility if we don't code our program carefully.

As depicted in the graph, memory is not accessed in bytes but in chunks. In the cache, data are accessed in terms of cache lines by the hardware. In the RAM, data are accessed in terms of pages by the OS. In the disk, data are accessed in terms of clusters. So any contension for the memory would definitely impact software performance.

For example, take a look at the following sample code.

// Thread 1
for(int i = 0; i < MAX; ++i ) {
++x;
}

// Thread 2
for(int i = 0; i < MAX; ++i ) {
++y;
}

If x and y are defined close together to fit into a cache line, due to the cache coherency protocol, only one thread can update the cache line at a time and the resulting behivor would like the code below. Originally, one would expect single-thread code to run twice as fast but only to find that the actual performance is not that much, because the cache line containing variable x and y becomes the hot spot.

// Thread 1
for(int i = 0; i < MAX; ++i ) {
lightweightMutexForXandY.lock();
++x;
lightweightMutexForXandY.unlock();
}

// Thread 2
for(int i = 0; i < MAX; ++i ) {
lightweightMutexForXandY.lock();
++y;
lightweightMutexForXandY.unlock();
}

So to avoid the convoy phenomenon, the author suggested several guidelines to follow.

Keep data that are not used together apart in memory to avoid convoy phenomenon.
Keep data that is frequently used together close together in memory to take advantage of locality.
Keep "hot"(frequently accessed) and "cold"(infrrequently accessed) data apart.

Tuesday, September 30, 2008

Unhandled Exception Processing In The CLR

In this article published in MSDN September, the author categorized 3 situations where an exception would occur in the CLR and what would happen if not catched in the CLR.

Exceptions in managed code
Exceptions in unmanaged c++ coded which is invoked via P/Invoke in managed code
Exceptions in managed code which is called via CLR Hosting API or COM Interop in native code

For situation 1 and 2, when a managed exception is not catched, the exception is swallowed by the CLR, if it happens in a thread created using System.Threading.Thread class, in the Finalizer thread or in the CLR thread pool threads. Otherwise, UnhandledException event is raised as part of the CLR's unhandled exception processing, which terminates the process. We can register an event handler to log about the failure for later diagnosis.

For situation 3, if an exception is thrown in a thread created in the CLR, in the .NET Framework 1.0 and 1.1, the CLR swallows the exception, while in the .NET Framework 2.0 and later, the CLR lets it go unhandled after triggering its unhandled exception process. On the other hand, if that exception is thrown on the thread created outside the CLR, the exception is wrapped as SEH Exception, and propagated backed to the native code.

Managed exceptions, if rethrown or packaged as inner exceptions as the following, the CLR would reset the starting point of the exception. The implication is that you may observe the point of failure is changed to that line in your log.

try
{
     ...
}
catch(FileNotFoundException e)
{
     ...
     throw e;    //or throw new ApplicationException(e);
}

Finally, if an exception occurs in a different AppDomain, this exception would be marshalled-by-value back and thrown in the calling AppDomain.

Monday, September 15, 2008

9 Reusable Parallel Data Structures and Algorithms

As I mentioned before, cpu development has moved into a multi-core stage. To fully utilize the computing power in the chip, concurrency programming is the key. This MSDN article introduced 9 parallel data structures and algorithms which allow one to design a multi-threaded program in a very intuitive way. The best part of this article I found is the author explained why the source code executes correctly and which parts of the code, if changed, could result into a deadlock.

Countdown Latch - This class allows a program to wait for a counter to reach zero. It uses Interlocked.Decrement to substract the counter and EventWaitHandle to coordinate to wait and signal events.
Reusable Spin Wait - Sometimes putting a program to sleep can be expensive due to the cost, i.e., context switch, involved. For a short period of time, i.e., cycles, spin wait is a good choice. This structure instructs a cpu to wait based on the number of processors in a computer. If there is only one cpu, the structure puts the program to sleep. If there is more than one cpu, the structure instructs the cpu to spin.
Barriers - Say, task1 and task2 can be executed in parallel, but task3 can not proceed until both task1 and task2 are completed. One way to code this is to run task2 in thread2 and execute task1 in thread1. After task1 complete, call Join in thread2 to wait for the completion of task2. Or we can use Barriers to wait for something to complete before proceed to the next step.
Blocking Queue - Very straightforward and easy to understand. It's also a good example of how to use Monitor.Pulse / Monitor.Wait in a correct way.
Bounded Buffer - This is a data structure to solve consumer/ producer problem. The class is implemented similar to Block Queue.
Thin Event - Win32 events can be expensive to allocate. This class uses the SpinWait structure to wait first and, if an event is still not signaled, it allocates a Win32 Event to wait on. Lazy evalation is the design philosophy behind this class.
Lock-Free LIFO Stack - You don't have to lock an entire stack to push or pop an item. The author showed that an Interlocked.CompareExchange operation is enough to ensure push or pop operations thread-safe in a stack.
Loop Tiling - If you know what parallel query is, you know what loop tiling is doing. It's easier to show you the codes than describe them in words.

List<T> list = ...;
foreach(T e in list) { S; }

A C# for loop above can be run in parallel using the function below.

Parallel.ForAll(list, delegate(T e) { S; }, Environment.ProcessorCount);
Parallel Reductions - Some operations, such as, max, min, count, sum, can be performed in parallel. The author provided a Reduce function to simplify the task. The function reminded me of the MapReduce method.

Monday, July 21, 2008

Delete Table vs Truncate Table in Oracle

Say, you create a table t. From time to time, you delete all the data in t and insert fresh records again. If that table grows to 1GB before deleting all the data, where are the 1GB after I delete all the rows in t? What's the difference between the two SQL statements, DELETE FROM t vs. TRUNCATE TABLE t, in term of space usage. Yes, the 1GB are still there, available for subsequent use, but in Oracle, there is some subtlety you must be aware of.

In Oracle, we store data in tables, while Oracle stores objects, including tables, in a tablespace. The space is allocated on demand. If the space allocated to a table is not enough, Oracle requests data blocks from the tablespace to accomodate new data. If the space in the tablespace is not enough, it grows to the next extent.

Oracle maintains a counter, High Water Mark(HWM), in the table indicating how many blocks have been allocated to the table. When the size of a table increases, the counter increases. When records are deleted and some data blocks are empty, the space is returned back to the table and the counter remains unchanged. Under manual segment space management, which is an attribute of a tablespace, the pointer to the free data blocks is appended to a FREELIST structure in the table. On the other hand, under automatic segment space management, a bitmap in the header of the data block is reset. So space allocated to a table remains in the table, whether it's in use or empty.

However, if you delete all the data in a table by issuing the command TRUNCATE t, Oracle returns the space back to the tablespace, which can be used by other tables, not just t and the HWM of the table is reset to zero. There are other uses of HWM, but here, I only care about its relationship with space. After Oracle 10g, you can also issue ALTER TABLE t SHRINK SPACE command to release the unused space back to the tablespace after DELETE FROM t.

Back to our question, running DELETE FROM t command without shrinking space, the 1GB are available for t only, while running TRUNCATE TABLE t command, the 1GB are returned back to the tablespace, available for all objects. So it's possible your tablespace keeps growing even if you have deleted data in tables.

Wednesday, July 9, 2008

Data Types in Oracle

I thought the topic was trivial, but if you read the chapter 12 of Tom Kyte's Oracle book, you might have a different thought. There are so many nuances for each data type that one might get bitten if one is not careful (or lucky). I summarize below some of them which, I think, are important to me.

VARCHAR2: A VARCHAR2(10) may contains 0~10 bytes of data using the default NLS settings. VARCHAR2 may contain up to 4000 bytes of information. If the NLS settings between the client and server are different, an implicit character set conversion will take place behind the scene. The NLS setting of Oracle client can be found here, while the default character set of Oracle Database can be determined with

SELECT value FROM nls_database_parameters
WHERE parameter = 'NLS_CHARACTERSET'

NVARCHAR2: A NVARCHAR2(10) may contains 0~10 characters of UNICODE formated data. Like VARCHAR2, NVARCHAR2 may contain up to 4000 bytes of information. Text in NVARCHAR2 is stored and managed in the database's national character set (UTF8 or UTF16 since Oracle9i), not the default character set. The national character set of Oracle Database can be determined with

SELECT value FROM nls_database_parameters
WHERE parameter = 'NLS_NCHAR_CHARACTERSET'

LOB: When a table with an internal LOB(eg. CLOB, NCLOB, BLOB) type is created, 2 additional user segments are created as well. One is lobindex; the other is lobsegment. What is stored in the table is a pointer, which can be used against logindex to find the LOB data stored in the lobsegment. So retrieving LOB causes extra disk access, due to logindex lookup. However, for LOB data smaller than 4000 bytes, it can be stored with the row in the table, if STORAGE IN ROW attribute is enabled (by default).

CHUNK is the smallest unit of allocation for LOBs. The default is 8K bytes. So one LOB data can have a least one chunk and two LOB data can never share the same chunk. Also, a small chunk requires many chunks to store large LOB data, and that increases the size of lobindex.

PCTVERSION controls the percentage of allocated lobsegment space that should be used for undo purpose. The default is 10%. If you get an ORA-22924 error while updating a LOB, try increase the amount of space in the lobsegment for versioning of data.

ROWID: the address of a row in the database. It's mutable for the following reasons.

Updating the partition key of a row in a partitioned table such that the row must move from one partition to another
Using the FLASHBACK table command to restore a database table to a prior point in time
Performing MOVE operations and many partition operations such as splitting or merge partitions
Using the ALTER TABLE SHRINK SPACE command to perform a segment shrink

Sunday, July 6, 2008

Patterns in Practice - The Open Closed Principle

When I started working as a programmer, I was fascinated with design patterns. Coding with patterns made my code looks elegant. When documenting, you just needed to point out what patterns you used and things would be very self-explanatory. But latter, I found, few recruiters here would care what design patterns I had applied in my software. Rather, they would be more interested in how broad and how deep I know about certain softwares, like databases, servers, operating systems, etc. To me, that implies product knowledge, not design knowledge, is more valuable to them. Given such a job market and I only have 24 hours a day, how much time should I invest in product knowledge and design knowledge?

So when I find an article on design patterns, I keep wondering if I should spend time on it. It's true that applying design patterns in my program would make it more flexible, but I haven't been rewarded financially for knowing these. To the contrary, based on what I have observed, knowing more about database or network infrastructure would be appreciated and possibly rewarded with a higher salary, even if I am a programmer, not a DBA or network administrator. Bosses and users don't care how I design software as long as it works as requested and no one reads documents. Having complained about that, still, I am going to summarize 2 MSDN articles on design patterns I have read recently. Why? For me, the temptation of writing elegant codes is difficult to resist.

Patterns in Practice - The Open Closed Principle

If you want your code easy to extend and robust to changes, consider applying the Open Closed Principle in your program. According to the author, it means that an application is structured in such a way that it's easy to enhance with minimal changes to existing code. Sounds appealing, but how? There are several patterns you can apply.

First, the author suggested that we should follow Single Responsibility Principle. That is, each class should be designed with some 'specialized' responsibility, for example, dividing business logic and data access responsibilities into separate classes. Then, a change in data access would not affect business logic classes.

Second, if there is lots of conditional logic in your program, consider using Chain of Responsibility Pattern. Then, when more conditions need to be handled, you only need to add additional logic in separate classes with minimal changes to exiting code.

Third, if you keep changing the interface of some class because you need to access/manipulate its states/variables, consider Double Dispath Pattern.

Fourth, if you access a class through its interface or abstract classes, then you follow Liskov Substitution Principle. The benefit is you can easily swap out the class with other implementation, and results in minimal changes to existing code.

Design Patterns - Model View Presenter

According to the article, the MVP architecture looks like the graph on the right. Presentation separates UI from the service. UI depends on Presentation and knows nothing about Service. Presentation depends on Service and know nothings about how data are retrieved and persisted. The pattern makes your service and presentation reusable.

This is also a good article on how to write an ASP.NET application in a test-driven way. The author showed us that by applying MVP pattern and NMock2 framework, your web applications are testable.

Recently, I start using CAB to develop winform applications. This article really helps to brush up my knowledge about MVP pattern, as the pattern is used heavily in CAB.
Further reading: Everything You Wanted To Know About MVC and MVP But Were Afraid To Ask

Saturday, July 5, 2008

Large Object Heap Uncovered

After reading through the chapter 20 in CLR via C#, automatic memory management, if you want to learn more about the topic, the MSDN article, Large Object Heap Uncovered picks up what the book left.

According to the author, Objects larger than 85KB are considered as large objects in CLR, and are allocated in the large object heap(LOH), rather than the usual small objects heap. In object allocation, since the memory CLR gives out is cleared, it takes longer time to clear the memory for a large object. On the other hand, large objects are generation 2 objects. So a generation 2 object collection causes all objects, both large and small, get collected.

To speed up the garbage collection in LOH, dead objects in LOH are marked and the space is put into a free list for later allocation. There is no compaction. However, because of this 'feature', frequent temporary large object release and allocation can cause virtual memory fragmentation.

One way to avoid frequent memory allocation and release and subsequent VM fragmentation is to build a object pool by oneself and reuse the objects in the pool instead of creating temporary ones. One can find an implementation here.

Another approach is utilizing VM Hoarding, supported in CLR 2.0. When enabled via hosting API, unused memory segments are not released to OS and are put on a standby list. When new memory segments are requested, the standby list is searched first. However, VM hoarding should be used with caution. it is advised that using this feature, you should test your application to make sure a stable memory usage.

Further Readings:
Maoni's WebLog - CLR Garbage Collector
CLR Inside Out: Large Object Heap Uncovered

Monday, June 30, 2008

Avoid Writing Finalize Method in C#

Say, a C# class, SomeClass, is defined as follows. If you think a destructor is defined in the class and it's a good practice to have a destructor in a class to clean up resources. Well, according to Jeffrey Richter's CLR via C#, you are wrong.

public class SomeClass

{

~SomeClass() { ... }

}

First, there is no destructor in C#, only Finalize method. The behavior of ~SomeClass() in C# is different from that in C++. That function will execute when CLR performs garbage collection. On the other hand, in C++, it is deterministic. When one deletes an instance of SomeClass, ~SomeClass() is executed. Although the goal of both methods in both languages is the same, that is, releasing resources, but the execution timing is different. Both have the same syntax with different semantics.

Second, for classes referencing managed resources, the book suggests one avoid writing a Finalize method for the reasons I quote below.

Finalizable objects take longer to allocate because pointers to them must be placed on the finalization list.
Finalizable objects get promoted to older generations, which increases memory pressure and prevents the object's memory from being collected at the time the garbate collector determines that the object is garbage. In addition, all objects referred to directly or inderectly by this object get promoted as well.
Finalizable objects cause your application to run slower since extra processing must occur for each object when collected.

Writing Finalize method to release managed resources actually adds overheads on the CLR at object construction and collection. Plus, you can't control when the Finalize method will execute. If you stop writing Finalize method in your class, your code will be simpler and performance will be better.

For types referencing unmanaged resources, one should implement IDisposable interface to deterministically dispose of resources. That is, in the Dispose method, one calls the Win32 CloseHandle function to release native resources, and then invokes GC.SuppressFinalize method as there is no need for the object's Finalize method to execute.

Friday, June 27, 2008

Covariance and Contravariance in C#2

I was reading Jon Skeet's C# in Depth the other day. Two terms, covariance and contravariance, were discussed in the book. I thought it would be helpful to summarized them here.

So what is covariance? Say, classB is derived from classA. Then, the following statement is valid in C#. The variable, array, can accept a more specific object. That is, arrays of reference-types in C# are convariant.

classA[] array = new classB[0];

However, there is some limit for you to do so in C#2. Convariance in generics is not supported. Generics is invariant and the following statement is not valid in C#2.

List<classA> list = new List<classB>;

Then, what is contravariance? Say, a function, func, is defined in classB as follows.

public classB : classA {

public int func(classB arg) { ... }

}

If contravariance were supported in C#, the following statements should be valid. The function, func, can accept a more general parameter and the parameter of func is said to be contravariant.

classA a = new classA();
classB b = new classB();
b.func(a); //not supported in C#

However, it's not supported this way in C#. C# supports contravariance in delegate and the following code snippet is valid.

void ProcessEvent(object sender, System.EventArgs e) { ... }

this.textBox1.KeyDown += this.ProcessEvent; //valid in C#2
this.button1.MouseClick += this.ProcessEvent; //valid in C#2

One should note that KeyDown event takes KeyEventArgs which is derived from EventArgs while MouseClick event takes MouseEventArgs which is also derived from EventArgs. So an event in C# can accept a delegate whose signature contains more general parameters.

Further readings:

Thursday, June 26, 2008

Some Oracle Basics

I am currently reading Chapter 10 and 11 of Tom Kyte's Oracle Book to brush up my database knowledge and would like to summarize the terms about tables and indexes for my future reference.

Row Migration: Suppose the size of row X is 50 bytes and is stored in block A. After an update to row X, the row grows to 100 bytes and can't fit into block A. Oracle will move the updated row to another block where it can fit, say block D, and replace the original row X in block A with a pointer to point to block D. After the update completes, Row X migrates from block A to block D.

Index Organized Table(IOT): For IOT, data are stored in an index structure. Data are automacially sorted according to the keys defined by the index structure. Querying an IOT usually takes one scan because data, not rowids, are stored in the leaf nodes.

Secondary Index: It's an index on index. When one creates an index for an IOT, that index is a secondary index. Leaf nodes contain logical rowids pointing to the data in the IOT. Since leaf nodes in IOT would change places due to shape and size change in IOT, the logical rowids can be stale. When a query is hit in such a situation, it takes 2 scans to locate data, one on the secondary index and the other on the IOT, slower than querying an index on a regular table, which takes one scan to locate rowids and one read to retrieve data.

Index Clustered Table(ICT): For ICT, data are clustered together according to cluster keys. ICT is different from IOT in several ways. First, data in ICT are not sorted by cluster keys as in IOT. data are stored in a heap. Second, data from different tables can be clustered together by the same columns in a ICT. Third, a cluster index takes a cluster key value and returns a block address, not rowid, in the cluster.

Bitmap index: An entry in a bitmap index points to many rows in a table, while an entry in a B*Tree index points to a row in a table. A bitmap index is suitable for indexing data of low cardinality, like gender, blood type, etc, and is especially suitable for aggregation queries, like count the number of women with blood type A. However, since a bitmap entry points to many rows in the table, a update to the entry will cause Oracle lock all the rows pointed to by the entry. So, bitmap indexes are ill suited for a write-intensive environment.

Function-based indexes: Suppose I have a query like the following

SELECT *
FROM USER
WHERE UPPER(USERNAME) = :USERNAME

Usually, Oracle performs a full-table scan to retrieve data from USER table. By running the following statement, Oracle will build a function-based index for us.

CREATE INDEX USER_UPPER_IDX on USER(UPPER(USERNAME));

For the above query, Oracle scans the index first, and then reads the rows pointed by the index. Essentially, Oracle creates a case-insensitive index for us.

Further readings: Understanding Indexes and Clusters

Friday, May 30, 2008

Links for 2008-05-29

American Scientist Article:

Accidental Algorithms
- Polynomial time and Exponential time algorithm are introduced and distinguished. Then, the author moves on to introduce the categories of problems, i.e., P, NP(decision problems), and #P(counting problems), as well as subgroups within the categories, i.e., NP-Complete and #P-Complete.
- Although #P problems is at least as hard as NP, for some counting problems, by transforming them into matrices and computing their determinants, some terms are cancelled out, and determinants can be computed in polynomial time. It means those counting problems, supposed to be harder than decision problems, can be solved in polynomial time.
- Non-holographic algorithms typically rely on reductions. That is , if we can reduce problem A to problem B and if there is a solution to problem B, then we know there is a solution to problem A.
- Holographic algorithms reduced problems in a different way. The author cited an example, using matchgrid and matchgate, to demonstrate the holographic process.

MSDN Article:

Alphabet Soup: A Survey of .NET Languages And Paradigms

Object-oriented programming

Languages - C#, Visual Basic .NET
Features

Type contracts
Polymorphism
Encapsulation

Functional programming

Languages - F#
Features

Higher-order functions
Type inference
Pattern matching
Lazy evaluation

Dynamic programming

Languages - IronPython, IronRuby, Phalanger
Features

Just-in-time (JIT) compilation
Late bound - Read Eval Print Loop (REPL)
Duck typing - "..., if a type looks like a duck and quacks like a duck, the compiler assumes it must be a duck."
Hosting APIs

Misc.
- LINQ
- Inline XML in Visual Basic 9.0
- Declarative programming - Windows Presentation Foundation and Windows Workflow Foundation
- Logic programming - Prolog

Friday, May 23, 2008

Pivot Query in Oracle

I recently revamped some stored procedures I wrote before and found a case where I can apply pivot query to simplify the code. Though I have seen the technique demonstrated in Tom Kyte' Expert One-on-One Oracle(ch.12), this is just another evidence that to know is one thing, to know where to apply is another.

The code I maintained read buy and sell trades from trade table
and sum them into position table for each symbol. Originally, I sumed
buy orders first and then sell orders and stored the results into position table. By using DECODE function, I can finish the task in fewer lines of code. Here is how.

Assuming my trade table and pos table are as follows.
(clieck to enlarge)

I want to sum records in trade table and store the summations in pos table. Using Pivot query as the sql in red square, I can get it done this way.
(clieck to enlarge)

This is what I want.
(clieck to enlarge)

Saturday, May 10, 2008

Book - Release It!: Design and Deploy Production-Ready Software

I came across the book from the announcement of 2008 Jolt Productivity Award. This book talks about what could go wrong (antipatterns) once a system go into production and what we could do about it (patterns). The focus of the book is not about what's in the spec but about what's not in the spec. There are 4 parts in the book, stability, capacipity, general design issues and operations. I listed below several points which impressed me most.

Timeout - "Any resource pool that blocks threads must have a
timeout to ensure threads are eventually unblocked whether resources become available or not".
Conway's law - "Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations." It's communication structure that matters, not hierarchical structure.
Multiplier effect - For a webpage containing only 40kb of useless data, if that page is hit 1000 times a day, it means 40mb of bandwidth is wasted per day.
Multihomed Servers - Server sockets of an application should bind to the networks they listen to. For a server socket handling administrative requests, it should bind to the administration network, not backup network or production network.
Protocol Versioning - No matter what protocol systems use to interact, it will change. So having the version be part of the message exchange will help ease the chage of protocols.

Monday, April 21, 2008

Links for 2008-04-21

Articles:

Simplify Ajax development with jQuery - jQuery allows us to do DOM scripting, event handling, HTML animation, Ajax and more.
Don't burn your Bridges - "It is not good to leave in anger. I advise that you don't burn your bridges"?? I think the point is don't express your anger when you leave.

Sample Chapter

jQuery in Action
- Chapter 1 - Introducing jQuery: The philosophy of jQuery is "Unobtrusive JavaScript", that is, to separate behavior from structure just as css does to separate style from structure.
- Chapter 5 - Sprucing up with animations and effects

Saturday, April 12, 2008

Comet - Pushing Data over Http

As described in this wiki article, Comet is a term, coined by Alex Russell, to describe the technologies for pushing data over Http. What is Comet? Comet is a web architecture allowing a web server to push data to a web browser using Http protocol. It is realized by taking advantage of the Http persistent connection. There are many solutions available, both open and proprietary. HTML 5 specification even attempts to standardize the Comet transport by adding a new HTML element, event-source and a new data format, called the DOM event stream. However, several issues have to be considered when one tries to adopt the technology.

First, most web servers assign a thread for handling a connection request. Since Comet keeps its http connection with a web server open, a web server can only serve limited amount of Comet connection and will run out of its threads pretty soon. A dedicated comet server should be deployed for pushing data.

Second, the traditional approach of scaling web applications by adding web servers might not be applicable to a Comet server. If a user connect to multiple event sources, it's difficult to distribute users among multiple Comet servers.

Third, proxy servers and firewalls could drop connections that have been open for too long. Though some Comet frameworks would tear down and recreate connection constantly, some proxies could send back buffered data and deceive clients into believing a connection is established.

Monday, April 7, 2008

First Taste of SubSonic

Having heard about SubSonic for almost a year, I finally decided to give it a try. I used SubSonic 2.1 Beta 2 and connected to Oracle9i database. Here were the two errors I encountered in the process, mostly related to stored procedures, and how I worked around the troubles. For the first error, I can use the precompiled SubSonic.dll directly. But to fix the second error, I had to modify the SubSonic source code.

The first error was related to the parameters I pass to a stored procedure. I got the following error after executing the stored procedure. Checking the source code, I found SubSonic add ':' as a prefix for each parameter name of stored procedures. Removing the colon before calling the Execute method solve the problem.Publish Post

System.Data.OracleClient.OracleException: ORA-06550: line 1, column 44:
PLS-00103: Encountered the symbol ":" when expecting one of the following:

( - + case mod new not null <an identifier>
<a double-quoted delimited-identifier> <a bind variable> avg
count current exists max min prior sql stddev sum variance
execute forall merge time timestamp interval date
<a string literal with character set specification>
<a number> <a single-quoted SQL string> pipe
The symbol "( was inserted before ":" to continue.

The second error was related to the output parameter of a stored procedure. The output parameter I expected was a DateTime but I kept getting the following error.

System.Data.OracleClient.OracleException: ORA-06502: PL/SQL: numeric or value error

Checking the source code, I found the AddParams method in OracleDataProvider class didn't set the parameter direction. I checked MySqlDataProvider.cs and made a modification to OracleDataProvider.cs in the same manner. The error was gone. Also, I found the CheckoutOutputParams method in MySqlDataProvider.cs didn't exist in OracleDataProvider. So I also added the same logic to OracleDataProvider.

       private static void AddParams(OracleCommand cmd, QueryCommand qry)
       {
           if(qry.Parameters != null)
           {
               foreach(QueryParameter param in qry.Parameters)
               {
                   OracleParameter sqlParam = new OracleParameter();
                   sqlParam.DbType = param.DataType;
                   sqlParam.OracleType = GetOracleType(param.DataType);
                   sqlParam.ParameterName = param.ParameterName;
                   sqlParam.Value = param.ParameterValue;
                   if (qry.CommandType == CommandType.StoredProcedure)
                   {
                       switch (param.Mode)
                       {
                           case ParameterDirection.InputOutput:
                               sqlParam.Direction = ParameterDirection.InputOutput;
                               break;
                           case ParameterDirection.Output:
                               sqlParam.Direction = ParameterDirection.Output;
                               break;
                           case ParameterDirection.ReturnValue:
                               sqlParam.Direction = ParameterDirection.ReturnValue;
                               break;
                           case ParameterDirection.Input:
                               sqlParam.Direction = ParameterDirection.Input;
                               break;
                       }
                   }
                   cmd.Parameters.Add(sqlParam);
               }
           }
       }

Sunday, March 16, 2008

Open Source Rule Engines

Implement business logic with the Drools rules engine - The author used Drools rule engine to demonstrate how business logic could be externalized in an application and possibly maintained by end users. Rules are stored in xml format and loaded into a rule engine at runtime. A rule is composed of 3 parts, parameter, condition and consequence. Parameters are passed in by the engine. If condition is met, then the instructions in consequence are executed. The instructions are defined in java language.
Getting Started With the Java Rule Engine API (JSR 94): Toward Rule-Based Applications - Java Rule Engine API is composed of rules administrator API and runtime client API. Users call the administrator API to register and unregister RuleExecutionSet and call client API to apply the RuleExecutionSet. Several open source rule engines were mentioned in the article as well, including Jess, Drools, FairIssac Blaze Advisor, ILog JRules, etc.

Wednesday, March 12, 2008

An Introduction to Enterprise Service Bus

The first chapter of Open-Source ESBs in Action has a good introduction to Enterprise Service Bus. The followings are discussed in the chapter.

EAI vs. ESB

EAI products are based on the hub and spoke model. All data exchange is centralized in the hub.
ESB products are based on the bus model. Data are distributed to the destinations through the bus. In the distribution process, data/messages can be transformed or enhanced.
The data exchange in ESB products is based on open standards, such as, JCA, XML, JMS, and web services standards.

Reasons to start thinking of an ESB

Necessity to integrate applications
Heterogonous environment
Reduction of total cost of ownership

Core functionalities of an ESB

Location transparency
Transport protocol conversion
Message transformation
Message routing
Message enhancement
Security
Monitoring and management

Current open source ESB projects

The authors also mentioned that Service Component Architecture (SCA) seems to be the next big thing in the ESB market. SCA is a specification based on the principles of service-oriented architecture. Vendors are investigating the possibility of transforming their ESB products to conform with the specification.

Monday, March 10, 2008

Free Icons

I found the websites via here and here. Great stuff!

Managed ThreadId

Managed Stack Explorer allows one to trace stacks for .NET 2.0 applications at run time. This is handy for tracing deadlock problems. But the project I worked on was based on .NET 1.1. What a pity! After a second thought, I realized that I just need to keep track of the resources locked by threads. When deadlock happened, that is, when a thread issued lock request over some resource but failed after a predefined period of time, I dumped the stack frame of the threads locking the resource. Then the problem boiled down to how to identify the resource locking threads. Originally, I used System.AppDomain.GetCurrentThreadId to identify a thread. But according to the documentation, this id is not stable.

Note An operating-system ThreadId has no fixed relationship to a managed thread, because an unmanaged host can control the relationship between managed and unmanaged threads. Specifically, a sophisticated host can use the Fiber API to schedule many managed threads against the same operating system thread, or to move a managed thread among different operating system threads.

Instead, to identify a thread in .NET 1.1, use Thread.GetHashCode. For applications in .NET 2.0 and later, use Thread.ManagedThreadId.

Tuesday, March 4, 2008

Oracle AnyData Data Type

I had been looking for a generic data type in Oracle for some time. Today I just found out that Oracle has offered this feature, AnyData, since 9i. Also, there are several examples on the Internet (here, and here) demonstrating its functionality.

Saturday, March 1, 2008

Links for 2008-03-01

D-Bus

MapReduce Method

Friday, February 29, 2008

Effective Interview for MBA Programs

Since I volunteer to interview MBA applicants for my alma mater, I have interviewed more than 30 persons so far. In these interviews, I have observed several key elements for an effective interview. If an applicant shows these key elements in the interview, it would be easy for me to write a favorable critique form for the applicant. The point is if an applicant can convince me, then it's easy for me to convince the admission committee. Belows are the key points I am looking for in an interview.

Differentiate yourself - Why are you different from other MBA applicants?
Be specific - When talk about yourself, please give examples to support your statements.
Think of Win-Win - Why do you think the admission should give you, in stead of other applicants, the admission?

Wednesday, February 27, 2008

ORA-01460 in SOCI

I mentioned before about the SOCI project. Recently, I was planning to adopt the library(version 2.2.0) into my project. However, while using the library to connect to Oracle 9i, I encountered a weird exception. The trouble I had was if I use soci::use(std::string) for more than one time in a sql query, then I get ORA-01460 error. For example, I used the following code to reproduce the error.

std::string prodid1("2330"), prodid2("2885");
SOCI::Row r;
SOCI::Statement st = (sql.prepare <<
"select * from product where prodid in (:prodid1, :prodid2)",
SOCI::use(prodid1), SOCI::use(prodid2),
SOCI::into(r));

It seemed related to the binding process in the function soci::StandardUseType::bind. After tweaking the source code for a while, I found by modifying the fuction definition as follows, the error was gone.

void OracleStandardUseTypeBackEnd::prepareForBind(
void *&data, sb4 &size, ub2 &oracleType)
{
...
case eXStdString:
oracleType = SQLT_STR;
// 4000 is Oracle max VARCHAR2 size; 32768 is max LONG size
//size = 32769;
size =sizeof(static_cast<std::string *>(data)->c_str());
...
}

Monday, February 25, 2008

Networking in Virtual PC 2007

It seems pretty straightforward to use Virtual PC 2007. The only problem people might encounter while making a guest Window XP is to setup its network environment. By following the suggestions in this post, I select "Shared Networking(NAT)" and configure to use DHCP in the guest OS. Then everything works and the virtual gateway in the guest OS is always 192.168.131.254.

On the other hand, if I want to connect to the guest OS from the host, I need to select a network adapter for the guest OS and assign an IP address for the adapter in the guest OS. Also, be sure to turn off the firewall in the guest OS. Then the outside machine can access server applications in the guest OS.

Finally, the "Undo Disks" option in the setting is useful, too. If enabled, I can rollback the changes I make to the guest OS since it starts.

Update: Here is a post compiled a list of many useful information about Virtual PC.

Tuesday, February 19, 2008

WinUnit: A Unit Testing Tool for Native C++ Applications

In this MSDN article, the author introduced WinUnit, a unit testing tool for Native C++ applications on Windows platform.

To use this tool, users need to call a set of macros the author provided to write test programs, compile test programs into Dlls and run test programs against WinUnit, a DOS program. Because there is no reflection capability built in C++ language, the author took advantage of DLL exports to invoke test functions in WinUnit program.

Several references were mentioned in the article.

Articles

The Humble Dialog Box

Books

Working Effectively with Legacy Code
Pragmatic Unit Testing with C# in NUnit

Debugging Applications for Microsoft .NET and Microsoft Windows

Sunday, February 10, 2008

SOCI (Cont'd)

This article explained how SOCI can be extended to support user-defined types. The author applied traits technique of generic programming in c++, and showed that by making use of template specialization via TypeConversion<T>, SOCI provided a noninvasive mechanism for extending SOCI.

Since std:tm is supported natively by SOCI, to work with boost::gregorian::date, a user only needs to provide the following template specialization and the compiler will make the right conversion for us.

#include <iostream>
#include <soci.h>
#include <boost/date_time/gregorian/gregorian.hpp>

using boost::gregorian::months_of_year;
using boost::gregorian::date;

namespace SOCI
{
template<> struct TypeConversion<date>
{
typedef std::tm base_type;
static date from(std::tm& t)
{
date d( t.tm_year + 1900,
static_cast<months_of_year>(t.tm_mon + 1),
t.tm_mday );
return d;
}

static std::tm to(date& d)
{
std::tm t;
t.tm_isdst = -1;
t.tm_year = d.year() - 1900;
t.tm_mon = d.month() - 1;
t.tm_mday = d.day();
t.tm_hour = 0;
t.tm_min = 0;
t.tm_sec = 0;
std::mktime(&t);
return t;
}
};
};
#endif

Friday, February 8, 2008

SOCI: Simple Oracle Call Interface

This article introduced a simple C++ database library, SOCI. According to the author, users only need to know two classes, Session and Statement, to complete most of interactions between client programs and database servers.

The following sample, excerpted from the article, demonstrated the simplicity of the library.

Session and Statement classes hide OCI calls from the users.
Free function use() binds variables to placeholders in the SQL statements.
Free function into() populates the variables with the values returned by select statements.
Shift operator in the Session enables storing a query and starts creating a temporary object that handles the rest of the expression.
Comma operator stores parameter information returned by use and into functions into the temporary object for later use.
The temporary object executes the sql statements in its destructor.

#include "soci.h"
#include <iostream>

using namespace std;
using namespace SOCI;

int main()
{
   try
   {
       Session sql("DBNAME", "user", "password");
       // example 1. - basic query with one variable used
       int count;
       sql << "select count(*) from some_table", into(count);
       // example 2. - basic query with parameter
       int id = 7;
       string name;
       sql << "select name from person where id = " << id, into(name);
       // example 3. - the same, but with input variable
       sql << "select name from person where id = :id", into(name), use(id);
       // example 4. - statement with no output
       id = 8;
       name = "John";
       sql << "insert into person(id, name) values(:id, :name)", use(id), use(name);
       // example 5. - statement used multiple (three) times
       Statement st1 = (sql.prepare <<
                                   "insert into country(id, name) values(:id, :name)",
                                   use(id), use(name));
       id = 1; name = "France"; st1.execute(1);
       id = 2; name = "Germany"; st1.execute(1);
       id = 3; name = "Poland"; st1.execute(1);
       // example 6. - statement used for fetching many rows
       Statement st2 = (sql.prepare << "select name from country", into(name));
       st2.execute();
       while (st2.fetch())
       {
           cout << name << '\n';
       }
   }
   catch (exception const &e)
   {
       cerr << "Error: " << e.what() << '\n';
   }
}

Esper: Event Stream Processing and Correlation

I mentioned about Esper before(here). Esper allows users to submit continuous queries to its engine and sends out alerts whenever conditions match user-defined queries. In this article, the authors showed code snippets to demonstrate Esper's usage.

Configure the engine using API or an XML file.
Register continuous queries.
Attach listeners to the queries.

For the engine to tap into event streams, according to the document, it is accomplished by sending events to the engine via the runtime interface.

Esper seems easy to use and, I think, it looks promising to gain popularity.

Monday, January 28, 2008

Using REFCURSOR Bind Variables In Oracle SQL/Plus

I came across this tip here. Here is the output I tried on my machine. Another tutorial is also available in SQL/Plus User Guide.

SQL>
SQL> column owner Format a10;
SQL> column object_id Format 9999;
SQL> column object_type Format a10;
SQL>
SQL> CREATE OR REPLACE PROCEDURE sp_qry_all_object(cur_ds IN OUT SYS_REFCURSOR) AS
2 BEGIN
3 OPEN cur_ds FOR select owner, object_id, object_type from all_objects where object_id<500 and owner='PUBLIC';
4 END;
5 /

Procedure created.

SQL>
SQL> variable cur1 REFCURSOR;
SQL> exec sp_qry_all_object(:cur1);

PL/SQL procedure successfully completed.

SQL> print cur1;

OWNER OBJECT_ID OBJECT_TYP
---------- --------- ----------
PUBLIC 223 SYNONYM
PUBLIC 278 SYNONYM
PUBLIC 272 SYNONYM
PUBLIC 275 SYNONYM

SQL>
SQL> CREATE OR REPLACE FUNCTION sf_qry_all_object RETURN SYS_REFCURSOR
2 AS
3 cur_ds SYS_REFCURSOR;
4 BEGIN
5 OPEN cur_ds FOR select owner, object_id, object_type from all_objects where object_id<500 and owner='PUBLIC';
6 RETURN (cur_ds);
7 END;
8 /

Function created.

SQL>
SQL> variable cur2 REFCURSOR;
SQL> exec :cur2 := sf_qry_all_object;

PL/SQL procedure successfully completed.

SQL> print cur2;

OWNER OBJECT_ID OBJECT_TYP
---------- --------- ----------
PUBLIC 223 SYNONYM
PUBLIC 278 SYNONYM
PUBLIC 272 SYNONYM
PUBLIC 275 SYNONYM

Tips For Boosting .NET WinForm Performance

In this MSDN article, tips were suggested for boosting .NET WinForm Performance and I summarized as follows.

Load fewer modules at startup
Precompile assemblies using NGen
Place strong-named assemblies in the GAC
Avoid base address collisions

dumpbin can check the preferred base address of Dlls.

Avoid blocking on the UI thread
Perform lazy processing

Some operations can be implemented in the Idle event and will be processed when applications are idle.

Populate controls more quickly

The following code snippet shows how to avoid constant repainting.
listView1.BeginUpdate();
for(int i = 0; i < 10000; i++)
{
ListViewItem listItem = new ListViewItem("Item"+i.ToString() );
listView1.Items.Add(listItem);
}
listView1.EndUpdate();

Exercise more control over data binding

BindingSource can help improve performance.
this.bindingSource1.SuspendBinding();
for (int i = 0; i < 1000; i++)
{
tbl.Rows[0][0] = "suspend row " + i.ToString();
}
this.bindingSource1.ResumeBinding();

Reduce repainting

Use SuspendLayout method whenever possible to minimize the number of Layout events.
Call Invalidate method and pass the area that needs to be repainted as an argument to Invalidate.

Use double buffering
Manage memory usage

If controls are added and removed from WinForm dynamically, call Dispose on them. Otherwise, extra unwanted handles would be accumulated in the process.

Use reflection wisely

Thursday, January 24, 2008

Links for 2008-01-24

Beautiful Code

Chapter 26 - Labor-Saving Architecture: An Object-Oriented Framework for Networked Software :

The authors demonstrated how a logging framework can be built in an object-oriented way by following commonality/variability analysis methodology.

The commonality was achieved by adopting Template Method pattern. That is, common steps in a logging server were defined in the Logging_server base class's run() method, deferring specialization of individual steps of its operation to hook methods in derived classes.
The variability was achieved by adopting Wrapper Facade pattern. That is, the variability of semantic and syntactic differences, such as synchronization and IPC mechanisms, was hidden into base classes, such as Acceptor and Mutex . Then using C++ template feature, Logging_server could choose the appropriate Acceptor and Mutex subclasses in a parameterized way.

Saturday, January 5, 2008

ManagedSpy - A Testing Tool For Managed WinForm Applications

ManagedSpy is a tool for access the properties and events of managed windows applications at run time. It's good for debugging or testing Windows Forms controls and is equivalent to Spy++ for unmanaged windows applications.

ManagedSpy is acutally an example demonstrating the usage of ManagedSpyLib. ManagedSpyLib introduced a class, ControlProxy, with which we can get or set properties or subscribe to events of a System.Windows.Forms.Control in another process.

An introduction to ManagedSpy and the internal of ManagedSpyLib can be found in this MSDN article. The following is a testing program listed in the article for testing a WinForm application which has 2 textboxes accepting 2 integers and one button which multiply the 2 integers and shows the product in a third textobx. The lines of code in the blue color use the methods of ControlProxy. As you can see, the testing program programmatically accesses the three textbox controls and one button control in another WinForm program.

private void button1_Click(object sender, EventArgs e)
{
Process[] procs = Process.GetProcessesByName("Multiply");
if (procs.Length != 1) return;
ControlProxy proxy =
ControlProxy.FromHandle(procs[0].MainWindowHandle);
if (proxy == null) return;

//find the controls we are interested in...
if (cbutton1 == null)
{
foreach (ControlProxy child in proxy.Children)
{
 if (child.GetComponentName() == "textBox1") {
     textBox1 = child;
 }
 else if (child.GetComponentName() == "textBox2") {
     textBox2 = child;
 }
 else if (child.GetComponentName() == "textBox3") {
     textBox3 = child;
 }
 else if (child.GetComponentName() == "button1") {
     cbutton1 = child;
 }
}

//sync testchanged on textbox3 so we can tell if it has changed.
       textBox1.SetValue("Text", "5");
       textBox2.SetValue("Text", "7");
       textBox3.SetValue("Text", "");
       textBox3.EventFired +=
           new ControlProxyEventHandler(textBox3_EventFired);
       textBox3.SubscribeEvent("TextChanged");
}
else textBox3.SetValue("Text", "");

//now click on the button to start the test...
if (cbutton1 != null)
{
       cbutton1.SendMessage(WM_LBUTTONDOWN, IntPtr.Zero, IntPtr.Zero);
       cbutton1.SendMessage(WM_LBUTTONUP, IntPtr.Zero, IntPtr.Zero);
   Application.DoEvents();
}

if (result == 35) MessageBox.Show("Passed!");
else MessageBox.Show("Fail!");
}

void textBox3_EventFired(object sender, ProxyEventArgs ed)
{
int val;
if (int.TryParse((string)textBox3.GetValue("Text"), out val)
{
result = val;
}
}