I was reading a book on advanced data structures over the past 2 days (and no, I have only managed to cover the first one and a half chapter). It has this neat idea bout shadow copies for dynamically allocated structures.
Let’s take an example of an array-based stack (based on vector or ArrayList, or even plain array). Normal dynamic allocation of a vector may be performed this way:
- Sstarts with a dynamically allocated array of certain size (might be 0, 1, a predefined number, or number passed in a constructor) and keep the size in the array_size variable;
- Keep in index variable that stores the position of the top of the element in the array;
- When array is filled (index equals array_size), reallocate the array to twice the size: now reallocate may take one of the two forms, if there is enough space to extend the array, it will simply extend it, otherwise it will reallocate a new array and copied the entire original arrays
Now this is a pretty typical method of dynamic allocation. If you took a look at linebuffer.c from GNU readline, it uses exactly the same simple algorithms.
What is the drawback of the method described above? Of course, that very costly copying of the array when the array needs to be reallocated (instead of extended). So the book advised of using the technique called shadow copy. It basically a technique where you keep two arrays. One of the size array_size, and another of size m times array_size. For the purpose of this post, let’s assume that m = 2.
With m = 2, as you’re filling the smaller array, you also copies 2 elements from the smaller array to the larger array. As the small array fills up, it will just in time copied every elements from the smaller array to the larger array. Rinse and repeat.
The reason why the copying will catch up is that it will always start copying from index 0 when the smaller array is half-filled: if you think about it, if you copies one element from the first half of the array and write the element you’re writing to both the smaller and larger array, it will finish copying the first-half of the array by the time the second-half is filled, while at the same time, the second-half is already copied by virtue of writing new element to both array.
Dynamic shadow copying
We return to this question again: what is the drawback of shadow copying? While the time taken for all operation will increase by at most a constant factor, the memory allocated to the array structure at any given time is always O((1+m)n) where n is array_size, a huge overhead. I kept asking myself, what should we do with this overhead? It’s huge, there must be something we can do about it. Moments later, I thought of one. I’m not claiming that this is the best use of such overhead, but surely it sounds interesting.
For ease of explanation, let’s start with an example. We start with just 1 array of size 16. We will fill the array halfway (8 elements) and attempt to extends the array to size 32 (note: not reallocate). If it succeeds, we do nothing and end up with array of size 32. Let’s continue from here, we’ll start filling the size 32 array with another 8 elements, reaching half capacity again. We then try to extend the array to size 64. Let’s assume it fails to extend to size 64. This is when we allocate a shadow copy array of size 64 and perform shadow copying as we fill the size 32 array.
What this meant is that under best circumstance you only need O(mn) space overhead, with worst case of O((1+m)n). While this may sound like a small difference, when we’re dealing with array of considerable size, such difference can account for 33% savings (for m = 2)! (And very slight savings in the amortized time of a given operation.)
The idea is interesting. You may argue that for large array where this method may be most beneficial, array extension may be virtually impossible. That is a really valid point with the current memory structure. However, this method does not induce much more complexity than shadow copying and it is certainly worthwhile to reclaim as much memory as possible. Furthermore, with evolving virtual memory model, there might be a time where we can allocate arrays over non-contiguous memory space (a virtual machine model), making such extension a way better option (and shadow copies worthless).
Now, as a reminder, it’s worthy to point out that even in the naive reallocation (the first method mentioned above), the worst case allocation is still O((1+m)n) when the reallocation occurs. With certain optimization, shadow copies is no worse than naive reallocation in terms of avoiding out-of-memory problem. One such optimization would be to not allocate a shadow copy array when it is impossible (out of memory) and simply reallocate when the array is full (failover mechanism).
Shadow copying will pose a problem for other data structures/processes, which may run out of memory due to the presence of shadow copies. This will be a problem in a system where memory allocation may vary wildly.