Posts Tagged ‘optimizing’

Optimize for the most common use case

Sunday, November 2nd, 2008

This is the best advice I’ve ever stick myself with when writing codes. Before delving deep into this, let’s take a look of one of my Scheme peeve (I was reminded of this when reading eric’s post in LispCast): the unavailability of length=.

Let’s write a simple Scheme code:

(define (two-len-list list)
  (= 2 (length list)))

What’s so wrong about this code? Well, a hint, a Scheme list is basically a linked-list. Got it? Yes! length will recurse through the entire list. The order of growth is O(n), where n is the length of the list for the above example. As suggested by eric in LispCast (his focus in the article above is that LISP should automatically optimized the above code to look like the code below, but that’s another matter), this should be easily optimized to:

(define (two-len-list list)
  (length= 2 list))

Apparently, that neat procedure is defined in LISP. In Scheme? AFAICT (Scheme is just a small hobby, I might have missed the procedure that does this), it’s not defined. Well, no matter, we can always write it ourselves.

(define (length= n lis)
  (cond ((= n 0) (empty? lis))
        ((empty? lis) false)
        (else (length= (- n 1) (cdr lis)))))

Neat! And it will only recurse n times at most. Used in two-len-list, it will make the procedure O(1). (Another thing that a lot of new Schemers do, including me during my first day learning about list, is to write this: (= 0 (length lis)), where it could be written as (empty? lis)).

Now, we come at the best part: optimize for the most common use case. Let’s use the same example, shall we? What if you know that in your code, the length of the list matters! Let’s say that in 70% of the call, the length of the list is required. Then make sure you keep the length of the list at all time, instead of recomputing it every time you need it. It used to be hard to do this when list was mutable, but with the advent of R6RS, list is now immutable, this becomes relatively straightforward. You can even abstract it into a new list-based data-structure.

(define (make-sclist lis)
  (list 'sclist (length lis) lis))
(define (is-sclist? sclist)
  (eq? 'sclist (car sclist)))
(define (sclist-length sclist)
  (if (not (is-sclist? sclist))
      (error "Not a self-counting list")
      (cadr sclist)))
(define (sclist-car sclist)
  (if (not (is-sclist? sclist))
      (error "Not a self-counting list")
      (car (get-plainlist sclist))))
(define (sclist-cdr sclist)
  (if (not (is-sclist? sclist))
      (error "Not a self-counting list")
      (list 'sclist
            (- (sclist-length sclist) 1)
            (cdr (get-plainlist sclist)))))
(define (get-plainlist sclist)
  (if (not (is-sclist? sclist))
      (error "Not a self-counting list")
      (caddr sclist)))
(define (sclist-cons new-member sclist)
  (if (not (is-sclist? sclist))
      (error "Not a self-counting list")
      (list 'sclist
            (+ 1 (sclist-length sclist))
            (cons new-member
                  (get-plainlist sclist)))))

When writing the above code, I realize how much I have coded in object-oriented paradigms. I kept wanting to write the above code in a message-passing style and encapsulated the length and the list in a let (similar to a private environment). At the end, I went for tag-based approach to increase the readability. You’ll also see the increase level of abstraction in the code.

Optimize for the most common use case is a very powerful advice. It allows you to focus your thoughts and willpower at that algorithms that will get accessed the most number of time, thus whatever small improvements you make there, it will be multiplied by the number of calls!

This does not only apply to programming. Take a look at database too. If you’re writing a blog, what’s the most common use case? Getting the last n posts, getting a posts between this date and that date, getting posts tagged with some tags, and searching. So optimize for them! Index the posts by date and tags. Try pre-scoring each posts to keywords (possibly, a keyword can be any word that is being mentioned in the post) and store them to speed up searching, at the expensive of higher storage.

On writing a webapps, make sure the most commonly accessed pages are the fastest. What is the point of optimizing those 10 pages that got accessed 3% of the time while another 5 pages were accessed 97% of the time (there is a point, but only after you optimize the 5 pages)? (Sometime it’s not easy to determine which pages will get accessed the most, but once you launched the webapps, you’d better be tracking the usage and allocate resources to those highly accessed pages.)

There is one place where it’s a little contradictory to follow this advice. Say in a webapps, you have the above scenario where 5 pages are being accessed 97% of the time. You would want links to these 5 pages in obvious places since users are very likely to want to go to these pages. At the same time, you may also want to publicize this really cool feature on the 6th page. So you also want to place a link to this page in obvious place. Well, all right, it’s not that contradictory, but in this case, you should optimize your screen real estate for both. If you opt for sidebar approach, it’s not too hard, a top navigation may be slightly more mischievous to deal with. Good luck!