Archive for November, 2008

Thanksgiving resolution: Ruby & Python

Friday, November 28th, 2008

Finally, thanksgiving is over. And I made a resolution. I’m going to learn Ruby, the language (probably Rails too, but let’s keep to Ruby first).

I have enough people now telling me that Ruby is a pretty language. Sure there are tonnes of implementations out there that don’t meet industry standard, but it surely is improving. And you know, Java didn’t start as fast as it is now. I am convinced now.

While I’m at it, I have to learn Python too. (This one is a “have to”.) I’ll very likely need this language next year since I’m hoping to do quite a bit of text processing next year. While text processing (as in things like log processing) can be boring, they are really just a means to an end for me. I have this really interesting goal right in front of me. But I suspect it’ll involve a lot of experimentation and log crawling and processing. Therefore: Python.

Okay, Ruby & Python, here I come!

Seinen manga

Friday, November 28th, 2008

Recently, I’ve started to read a manga (Japanese comics) genre called seinen. It is supposed to be a genre for university students and older males in general, as opposed to shounen for younger males, or, well, boys. (If you’re interested, the female equivalent is josei and shoujo respectively.) Seinen manga is suprisingly… interesting. It’s very refreshing and different from the hack and slash shounen and the insanely lovey-dovey shoujo.

Seinen ranges from utterly comedic to deeply psychological. Some of them still scars me to this day when I recalled the story. Indeed, it is not a genre to be taken lightly. So today, instead of discussing the usual programming stuffs, I shall be introducing some of the seinen manga I’ve read recently.

1. The Lucifer and Biscuit Hammer (Wakusei no Samidare) by Mizukami Satoshi

This manga shocked me. I was laughing uncontrollably after only the first two pages! It was an awesome introduction (read it here: page 1 and 2). The story revolves around a college student who woke up to find an… err… talking lizard. Hilarious! Oh, not to forget that he later on swore allegiance to a princess who wants to save the earth from being destroyed by a… biscuit hammer. Did I say that the princess does that just so that she could destroy the earth herself? Gosh!

Read it here.

2. Angel Heart by Hojo Tsukasa

The first thing I noticed about this manga is that it has a unique drawing style among manga. It looks more like an older American graphic novels than a Japanese manga. After several chapters, it matters no longer. The story revolves around an assassin who committed suicide only to receive a stolen heart transplant and ending up becoming the “daughter” of the guy who is the fiance of the dead girl from whom the heart was stolen. It’s a heartwarming story of redemption of an ex-assassin doing good deeds as a city hunter.

Read it here.

3. Binbou Shimai Monogatari by Kazuto Izumi

The story revolved around two sisters who are incredibly poor and living alone in a meagre apartment rented from a fierce landlord. They constantly had to skimp on food. It tells the tale of the two sisters coping with their lives with a smile everyday. A truly slice of life story, it amazingly sticks to the premise up to the end. It’s as if you’re looking at two real sisters in real life and not in a manga. There is no happy ending and I didn’t expect it to have one. After all, the two sisters are already happy from the beginning just by having each other’s company.

Read it here.

4. Real by Inoue Takehiko

This is a deeply psychological manga. It provides a looking glass into a world of handicapped basketball through the eye of a school drop-out (who used to be a darn good basketball player), an ex-sprinter who had his leg amputated, and a popular basketball captain who lost all his friends after being involved in an accident and was paralyzed from waist down. It is a perfect twist to other basketball manga like Slam Dunk or Cross Over.

Read it here.

5. Skyhigh by Takahashi Tsutomu

Do you believe in afterlife? This story brings you into the gate of heaven, where a lone girl becomes its guardian, offering each spirit three options: one, to go through the gate and be reborn; two, to stay as wandering spirit on earth; or three, to curse (read: kill) one person on earth and go to hell afterwards. Each chapter portrays the decision each spirit made and the consequences. I was captivated. I read the continuation: Skyhigh Karma, but I still think the original Skyhigh is far more amazing than the second.

Read it here. (Read Skyhigh Karma here.)


There were several other seinen I’ve read recently, notably Hot Milk, Bitter Virgin and Lucu Lucu; but I forced myself to limit the list to five. I was really torn when I was at #4. I decided to put up the two that are most different from the rest. But these three are pretty awesome as well.

Seinen is a refreshing genre because it’s so diverse. There is no common plots among many of the popular seinen manga. I’m glad I read them. Though I think they contribute to my headache. Imagine working 10 hours in front of an LCD screen and reading manga on another LCD screen at home. Surely, there is a limit for my eyes… :(

Dynamic allocation: dynamic shadow copies

Thursday, November 27th, 2008

I was reading a book on advanced data structures over the past 2 days (and no, I have only managed to cover the first one and a half chapter). It has this neat idea bout shadow copies for dynamically allocated structures.

Let’s take an example of an array-based stack (based on vector or ArrayList, or even plain array). Normal dynamic allocation of a vector may be performed this way:

  1. Sstarts with a dynamically allocated array of certain size (might be 0, 1, a predefined number, or number passed in a constructor) and keep the size in the array_size variable;
  2. Keep in index variable that stores the position of the top of the element in the array;
  3. When array is filled (index equals array_size), reallocate the array to twice the size: now reallocate may take one of the two forms, if there is enough space to extend the array, it will simply extend it, otherwise it will reallocate a new array and copied the entire original arrays

Now this is a pretty typical method of dynamic allocation. If you took a look at linebuffer.c from GNU readline, it uses exactly the same simple algorithms.

Shadow copying

What is the drawback of the method described above? Of course, that very costly copying of the array when the array needs to be reallocated (instead of extended). So the book advised of using the technique called shadow copy. It basically a technique where you keep two arrays. One of the size array_size, and another of size m times array_size. For the purpose of this post, let’s assume that m = 2.

With m = 2, as you’re filling the smaller array, you also copies 2 elements from the smaller array to the larger array. As the small array fills up, it will just in time copied every elements from the smaller array to the larger array. Rinse and repeat.

The reason why the copying will catch up is that it will always start copying from index 0 when the smaller array is half-filled: if you think about it, if you copies one element from the first half of the array and write the element you’re writing to both the smaller and larger array, it will finish copying the first-half of the array by the time the second-half is filled, while at the same time, the second-half is already copied by virtue of writing new element to both array.

Dynamic shadow copying

We return to this question again: what is the drawback of shadow copying? While the time taken for all operation will increase by at most a constant factor, the memory allocated to the array structure at any given time is always O((1+m)n) where n is array_size, a huge overhead. I kept asking myself, what should we do with this overhead? It’s huge, there must be something we can do about it. Moments later, I thought of one. I’m not claiming that this is the best use of such overhead, but surely it sounds interesting.

For ease of explanation, let’s start with an example. We start with just 1 array of size 16. We will fill the array halfway (8 elements) and attempt to extends the array to size 32 (note: not reallocate). If it succeeds, we do nothing and end up with array of size 32. Let’s continue from here, we’ll start filling the size 32 array with another 8 elements, reaching half capacity again. We then try to extend the array to size 64. Let’s assume it fails to extend to size 64. This is when we allocate a shadow copy array of size 64 and perform shadow copying as we fill the size 32 array.

What this meant is that under best circumstance you only need O(mn) space overhead, with worst case of O((1+m)n). While this may sound like a small difference, when we’re dealing with array of considerable size, such difference can account for 33% savings (for m = 2)! (And very slight savings in the amortized time of a given operation.)

Caveats

The idea is interesting. You may argue that for large array where this method may be most beneficial, array extension may be virtually impossible. That is a really valid point with the current memory structure. However, this method does not induce much more complexity than shadow copying and it is certainly worthwhile to reclaim as much memory as possible. Furthermore, with evolving virtual memory model, there might be a time where we can allocate arrays over non-contiguous memory space (a virtual machine model), making such extension a way better option (and shadow copies worthless).

Now, as a reminder, it’s worthy to point out that even in the naive reallocation (the first method mentioned above), the worst case allocation is still O((1+m)n) when the reallocation occurs. With certain optimization, shadow copies is no worse than naive reallocation in terms of avoiding out-of-memory problem. One such optimization would be to not allocate a shadow copy array when it is impossible (out of memory) and simply reallocate when the array is full (failover mechanism).

Shadow copying will pose a problem for other data structures/processes, which may run out of memory due to the presence of shadow copies. This will be a problem in a system where memory allocation may vary wildly.

Rant #2: Backward-compatibility and API-breaking changes

Wednesday, November 26th, 2008

So, recently, I’ve switched some of the codes I maintained to use this great new version of a common utility (shall not bore you with the details). The new version has tonnes of great, new stuffs. It was exciting. I managed to clean up my config files a lot just by using the new version.

Of course, the new version specifically promised that while it is breaking the API, all of its functionality that already exist in the previous version already exists in the new one and that user (i.e. me) can use an adapter classes to convert between the two versions.

All is well, until today. Today, I had to extend part of my code to work together with an older code that uses the previous API. Guess what? While the old build rules has been updated so that it can depend on the new build rule for the new version, the reverse is not true! Hold on! Why!? Yea, that surprised me a lot! So I had two options, extracting the part I need and update it to use the new API and make the two systems depend on this or update the older code to use the new version.

I chose the latter because it wasn’t a huge amount of work and the design is cleaner if I kept with the old design (breaking up that part of the code seems rather hackish to me). After half an hour of updating the old code, I got both system working together well (I wrote some script to automate some of the task, if you’re wondering why it takes only half an hour). The code links well, though since there are many other codes that depend on the other code that I just modified, I had to run the API in a compatibility mode, but that’s no big deal.

Only minutes later, when I started writing unit tests (before I even started writing my production code), I realized that, umm, once the new API works on compatibility mode, you have to use the adapter class to get the new functionality I need. There is no automatic detection, everything is manual! No convenient methods to convert one to another easily (I mean it’s C++, they could easily write operator() to convert from one to the other pretty darn easily!). Crap! I realized that I had to modify the codes in far too many places to make this worth it. So I just decided to ditch the feature I need.

There are so many lessons I learned from just today’s experience. If I were to write the new API, I would have provide automatic conversion between one to another, I would have backported some of the most important new features (mind you, these new features are very doable with the old version), I would try my best not to make any API-breaking changes. You know, having the old and new data structures extend the same base class would have been much better.

Lastly, seriously, I know there were some serious problems with the old code, but I’d rather have those codes deprecated slowly over time while the new code is being phased in. During this transition, you better don’t just introduce new features on just the new code (or worse, just the old code). You need to support both of them, or otherwise, help people who use the old version migrate. This is no open source code, it’s a privately-owned code. While the code base is arguably huge, there have been evidence of successful deprecation in the past, why can’t this one do that?

Child sending naked picture of herself persecuted…

Sunday, November 23rd, 2008

… Really now? Seriously?

This has got to be the stupidest news I’ve seen in awhile. Not to mention the stupidest public persecution in quite awhile too.

Oh and it doesn’t stop there…

A prosecutor says Licking County authorities also considering charges for students who received the photos.

Okay… (Probably no wonder that the county and the school has, well, you know, the word licking in it.) I’m still half-hoping that this is an “onion”-news[1].


On unrelated note, Stoyan Stefanov from YUI blog has just posted a pretty neat article on pngcrush (among other things, like jpeg metadata stripper). This should really be made a common practice nowadays. A friend of mine has even written a short script to automate CSS spriting followed by pngcrush recently. He told me it helps reduce the file size of the sprited PNG by twenty something percent. Not to mention a lot less HTTP requests.

C++’s scoped_ptr and unique_ptr smart pointers

Sunday, November 23rd, 2008

I got bitten again just the other day when I was modifying old code. It crashed. Yes, I added an extra delete when one is not required resulting in a double deletion. It reinforced my belief that most pointers in C++ should be smart pointers. A new C++ programmers will be caught often for memory leaks for forgetting to delete a pointer. As you get more experienced, while forgetting to delete becomes far less of a problem, double deletion becomes more rampant (Hey! It’s hard to keep track of object ownership you know? Especially when you’re rushing to modify other people’s code before deadline…)

scoped_ptr

To the rescue is Boost scoped_ptr smart pointer. There are two main reasons, in my opinion, to use scoped pointers.

Scoped pointers ease manual memory management. It holds a pointer to the object that it manage and it performs automatic deletion of the object it holds when it is deleted. The scoped_ptr object itself is a templated auto pointer, which means that it will be deleted automatically when it goes out of scope. Here is a simple, incomplete implementation of scoped_ptr:

template <typename T>
class scoped_ptr : noncopyable {
 public:
  explicit scoped_ptr(T* p = NULL) { p_ = p; }
  ~scoped_ptr() {
    if (p_ != NULL) delete p_;
  }
  void reset(T* p = NULL) {
    if (p_) delete p_;
    p_ = p;
  }

  // Some implementation may choose
  // to crash if p_ is NULL for the
  // following 3 operators.
  T& operator*() const { return *p_; }
  T* operator->() const { return p_; }
  T* get() const { return p_; }
 private:
  T* p_;
};

Let’s analyze the class. It extends Boost’s noncopyable interface (some would prefer to use macro for this), which implies that a scoped_ptr object may not be copied or assigned to another scoped_ptr. It induces a strict ownership of the owned pointer. As you see above, the destructor of scoped_ptr simply delete the held pointer, similarly with reset(T*) method.

This brings us to a second, more important point. scoped_ptr enforces a strict ownership of an object. In another word, it forces us programmers to think and then rethink about the ownership of an object. Many times, the problem with C++ developers are not forgetting to delete. It is not knowing who exactly owns an object. For example, let’s check out the following really simple class definition:

class ConfusedClass {
 public:
  ConfusedClass() {}
  ~ConfusedClass() {}
  void DoSomething() {
     a_ = b_.PerformSomething();
  }
  AnotherClass* GetA() { return a_; }
 private:
  AnotherClass* a_;
  YetAnotherClass b_;
};

In a world without scoped_ptr, this class can be really confusing. Who owns the object held by a_? Is it b_? Is it ConfusedClass? Or is it the class who calls GetA? The last option looks unlikely here. But it’s pretty hard to differentiate between the first two cases! A subsequent reader of the class definition would probably need to dig YetAnotherClass to determines that information. (Note also that the destructor is empty, it can be that b_ holds the object held by a_, or… it can be a bug—forgetting to delete a_!)

With scoped_ptr, when we write ConfusedClass, we should be thinking about the ownership of the object held by a_. And if we think this class should owns it, we should use scoped_ptr<AnotherClass> a_ instead! That way, subsequent reader of the class definition knows for sure that the object is owned by ConfusedClass (or shall we call it SmartClass now).

As a bonus, code with multiple exit path will be easily managed with scoped_ptr (instead of hunting each exit path and making sure all the pointers that should be deleted are deleted). Imagine how troublesome it is to manage a method with a throw for example. (I remembered writing a Java code where I’ve had to always have 3 if blocks in the finally part of a try-catch-finally, to actually check for null and close a ResultSet, a PreparedStatement, and an SqlConnection. In C++, I’ll simply write a wrapper similar to scoped_ptr to perform the closing.)

unique_ptr

C++0x expands the smart pointer repertoire even more with unique_ptr (Committee Draft §20.8.12). This smart pointer has a strict ownership as with scoped pointer. However, it is MoveConstructible and MoveAssignable (as described by the CD). What those jargons mean is that a unique_ptr s can be constructed with parameter of another unique_ptr u with a corresponding ownership transfer of the held object from u to s (MoveConstructible) and a unique_ptr u can be assigned to another unique_ptr s with ownership of the owned object transferred from u to s (MoveAssignable).

This pointer adds a little extra value to scoped pointer version. That is, you can transfer ownership (there is no release method in scoped pointer, while unique_ptr has not just a move constructor and move assignment, but also an explicit release method). This is basically a better version of std::auto_ptr (I’ve heard talk of making auto pointer deprecated).

To effectively used smart pointer, use the correct smart pointer for each of your needs. If you need a strict ownership semantics without any trasnfer, use scoped_ptr. If you need ability to transfer ownership in addition to that, use unique_ptr (or std::auto_ptr). Even better, make such rules part of your software/company’s style guide. Future maintainers will thank you when he can easily see an orderly semantics in the chaos.

shared_ptr

There is another smart pointer introduced in C++0x. The name is shared_ptr (CD § 2.8.13.2). This pointer is basically a referenced-counting smart pointers that implements shared ownership of the held object. When the last shared_ptr holding the particular object is destructed, the object is deleted too. Now, I won’t delve too much into this smart pointer because I believe in strict ownership as opposed to shared one. There should be a very, very rare situation where it demands shared ownership of an object. In a good software design, only one object should owned a particular object.

Now there is one place where shared_ptr is very, very useful: the STL containers. When you insert or retrieved a member into an STL containers, a copy of the object is made. For performance reason, keeping a pointer in the containers make lots of sense (especially when copying is expensive). As with any pointer usage, it becomes very hard to keep track of these objects. Hence the usage of shared_ptr. Copying a shared pointer is cheap. Additionally, there’s no risk of forgetting to delete a pointer within an STL containers. (Keep in mind that shared_ptr is usually twice as big as a normal pointer as it needs to keep another pointer to the reference counter.)

Example usage:

vector<shared_ptr<ClassA> > vector_of_a;
hash_map<int, shared_ptr<ClassA> > map_of_int_to_a;