Monday, July 21, 2008

Delete Table vs Truncate Table in Oracle

Say, you create a table t. From time to time, you delete all the data in t and insert fresh records again. If that table grows to 1GB before deleting all the data, where are the 1GB after I delete all the rows in t? What's the difference between the two SQL statements, DELETE FROM t vs. TRUNCATE TABLE t, in term of space usage. Yes, the 1GB are still there, available for subsequent use, but in Oracle, there is some subtlety you must be aware of.

In Oracle, we store data in tables, while Oracle stores objects, including tables, in a tablespace. The space is allocated on demand. If the space allocated to a table is not enough, Oracle requests data blocks from the tablespace to accomodate new data. If the space in the tablespace is not enough, it grows to the next extent.

Oracle maintains a counter, High Water Mark(HWM), in the table indicating how many blocks have been allocated to the table. When the size of a table increases, the counter increases. When records are deleted and some data blocks are empty, the space is returned back to the table and the counter remains unchanged. Under manual segment space management, which is an attribute of a tablespace, the pointer to the free data blocks is appended to a FREELIST structure in the table. On the other hand, under automatic segment space management, a bitmap in the header of the data block is reset. So space allocated to a table remains in the table, whether it's in use or empty.

However, if you delete all the data in a table by issuing the command TRUNCATE t, Oracle returns the space back to the tablespace, which can be used by other tables, not just t and the HWM of the table is reset to zero. There are other uses of HWM, but here, I only care about its relationship with space. After Oracle 10g, you can also issue ALTER TABLE t SHRINK SPACE command to release the unused space back to the tablespace after DELETE FROM t.

Back to our question, running DELETE FROM t command without shrinking space, the 1GB are available for t only, while running TRUNCATE TABLE t command, the 1GB are returned back to the tablespace, available for all objects. So it's possible your tablespace keeps growing even if you have deleted data in tables.

Further Reading:Using TRUNCATE

Wednesday, July 9, 2008

Data Types in Oracle

I thought the topic was trivial, but if you read the chapter 12 of Tom Kyte's Oracle book, you might have a different thought. There are so many nuances for each data type that one might get bitten if one is not careful (or lucky). I summarize below some of them which, I think, are important to me.

VARCHAR2: A VARCHAR2(10) may contains 0~10 bytes of data using the default NLS settings. VARCHAR2 may contain up to 4000 bytes of information. If the NLS settings between the client and server are different, an implicit character set conversion will take place behind the scene. The NLS setting of Oracle client can be found here, while the default character set of Oracle Database can be determined with

SELECT value FROM nls_database_parameters
WHERE parameter = 'NLS_CHARACTERSET'

NVARCHAR2: A NVARCHAR2(10) may contains 0~10 characters of UNICODE formated data. Like VARCHAR2, NVARCHAR2 may contain up to 4000 bytes of information. Text in NVARCHAR2 is stored and managed in the database's national character set (UTF8 or UTF16 since Oracle9i), not the default character set. The national character set of Oracle Database can be determined with

SELECT value FROM nls_database_parameters
WHERE parameter = 'NLS_NCHAR_CHARACTERSET'

LOB: When a table with an internal LOB(eg. CLOB, NCLOB, BLOB) type is created, 2 additional user segments are created as well. One is lobindex; the other is lobsegment. What is stored in the table is a pointer, which can be used against logindex to find the LOB data stored in the lobsegment. So retrieving LOB causes extra disk access, due to logindex lookup. However, for LOB data smaller than 4000 bytes, it can be stored with the row in the table, if STORAGE IN ROW attribute is enabled (by default).

CHUNK is the smallest unit of allocation for LOBs. The default is 8K bytes. So one LOB data can have a least one chunk and two LOB data can never share the same chunk. Also, a small chunk requires many chunks to store large LOB data, and that increases the size of lobindex.

PCTVERSION controls the percentage of allocated lobsegment space that should be used for undo purpose. The default is 10%. If you get an ORA-22924 error while updating a LOB, try increase the amount of space in the lobsegment for versioning of data.

ROWID: the address of a row in the database. It's mutable for the following reasons.

  1. Updating the partition key of a row in a partitioned table such that the row must move from one partition to another
  2. Using the FLASHBACK table command to restore a database table to a prior point in time
  3. Performing MOVE operations and many partition operations such as splitting or merge partitions
  4. Using the ALTER TABLE SHRINK SPACE command to perform a segment shrink

Sunday, July 6, 2008

Patterns in Practice - The Open Closed Principle

When I started working as a programmer, I was fascinated with design patterns. Coding with patterns made my code looks elegant. When documenting, you just needed to point out what patterns you used and things would be very self-explanatory. But latter, I found, few recruiters here would care what design patterns I had applied in my software. Rather, they would be more interested in how broad and how deep I know about certain softwares, like databases, servers, operating systems, etc. To me, that implies product knowledge, not design knowledge, is more valuable to them. Given such a job market and I only have 24 hours a day, how much time should I invest in product knowledge and design knowledge?

So when I find an article on design patterns, I keep wondering if I should spend time on it. It's true that applying design patterns in my program would make it more flexible, but I haven't been rewarded financially for knowing these. To the contrary, based on what I have observed, knowing more about database or network infrastructure would be appreciated and possibly rewarded with a higher salary, even if I am a programmer, not a DBA or network administrator. Bosses and users don't care how I design software as long as it works as requested and no one reads documents. Having complained about that, still, I am going to summarize 2 MSDN articles on design patterns I have read recently. Why? For me, the temptation of writing elegant codes is difficult to resist.

Patterns in Practice - The Open Closed Principle

If you want your code easy to extend and robust to changes, consider applying the Open Closed Principle in your program. According to the author, it means that an application is structured in such a way that it's easy to enhance with minimal changes to existing code. Sounds appealing, but how? There are several patterns you can apply.

First, the author suggested that we should follow Single Responsibility Principle. That is, each class should be designed with some 'specialized' responsibility, for example, dividing business logic and data access responsibilities into separate classes. Then, a change in data access would not affect business logic classes.

Second, if there is lots of conditional logic in your program, consider using Chain of Responsibility Pattern. Then, when more conditions need to be handled, you only need to add additional logic in separate classes with minimal changes to exiting code.

Third, if you keep changing the interface of some class because you need to access/manipulate its states/variables, consider Double Dispath Pattern.

Fourth, if you access a class through its interface or abstract classes, then you follow Liskov Substitution Principle. The benefit is you can easily swap out the class with other implementation, and results in minimal changes to existing code.

Design Patterns - Model View Presenter

According to the article, the MVP architecture looks like the graph on the right. Presentation separates UI from the service. UI depends on Presentation and knows nothing about Service. Presentation depends on Service and know nothings about how data are retrieved and persisted. The pattern makes your service and presentation reusable.

This is also a good article on how to write an ASP.NET application in a test-driven way. The author showed us that by applying MVP pattern and NMock2 framework, your web applications are testable.

Recently, I start using CAB to develop winform applications. This article really helps to brush up my knowledge about MVP pattern, as the pattern is used heavily in CAB.

Further reading: Everything You Wanted To Know About MVC and MVP But Were Afraid To Ask

Saturday, July 5, 2008

Large Object Heap Uncovered

After reading through the chapter 20 in CLR via C#, automatic memory management, if you want to learn more about the topic, the MSDN article, Large Object Heap Uncovered picks up what the book left.

According to the author, Objects larger than 85KB are considered as large objects in CLR, and are allocated in the large object heap(LOH), rather than the usual small objects heap. In object allocation, since the memory CLR gives out is cleared, it takes longer time to clear the memory for a large object. On the other hand, large objects are generation 2 objects. So a generation 2 object collection causes all objects, both large and small, get collected.

To speed up the garbage collection in LOH, dead objects in LOH are marked and the space is put into a free list for later allocation. There is no compaction. However, because of this 'feature', frequent temporary large object release and allocation can cause virtual memory fragmentation.

One way to avoid frequent memory allocation and release and subsequent VM fragmentation is to build a object pool by oneself and reuse the objects in the pool instead of creating temporary ones. One can find an implementation here.

Another approach is utilizing VM Hoarding, supported in CLR 2.0. When enabled via hosting API, unused memory segments are not released to OS and are put on a standby list. When new memory segments are requested, the standby list is searched first. However, VM hoarding should be used with caution. it is advised that using this feature, you should test your application to make sure a stable memory usage.

Further Readings:
Maoni's WebLog - CLR Garbage Collector
CLR Inside Out: Large Object Heap Uncovered