Archive for April, 2009

Chrome Shorts

Thursday, April 30th, 2009

OH MY GOD! The Chrome Shorts are really cute! It’s as if I’m turning into a small kids awestruck by some of the videos! Good timing! Have been pretty stressed up getting things right in my mind (while getting ready for exams).

Anyway, this one is the cutest! Shit! The cat and the video is just so cute!

(Watch the other 10 videos from YouTube’s Chrome channel.

Introduction to Probability (book)

Wednesday, April 29th, 2009
Introduction to Probability, 2e

I have finally decided to get this book. Have been struggling quite a bit with some more complex probability for mathematical and systems modeling and for my research. So I thought getting more rigorous grounding would help. Seems that a lot of people are recommending this book, so I thought I’ll give it a try. Will update after I completed at least several chapters of the book. (Well, the book itself won’t arrive for another week or so.)

Closure in Java

Monday, April 20th, 2009

I believe Closure should be part of any modern language, so I was delighted to read about Closure for Java (currently a JSR draft I believe) from here. Currently, Closure is not implemented in Java and is impossible to implement in its natural form. We can attempt to implement closures using anonymous objects, but that comes with two main issue: (1) it can not access non-final variables defined in its enclosing code; (2) you have to define closure interface for every possible scenario. We really can’t do anything with the first problem; but, as we see later, we can simplify the syntax considerably. Here is an example:

public interface ClosureIntReturnInt {
  public int invoke(int arg1);
}

@Test
public void testNaiveClosure() {
  final int a = 1; // This must be final!
  ClosureIntReturnInt addA =
      new ClosureIntReturnInt() {
    public int invoke(int v) {
      return v + a;
    }
  };

  assertEquals(3, addA.invoke(2));
}

This is obviously not a very good solution. So my first instinct would be to genericize the closure interface. Sure enough, it simplifies the problem by quite a bit. Here is my genericize Closure interface:

public interface Closure0Arg<R> {
  public R invoke();
}
public interface Closure1Arg<R, A1> {
  public R invoke(A1 arg1);
}
public interface Closure2Args<R, A1, A2> {
  public R invoke(A1 arg1, A2 arg2);
}
.
.
.

Sure enough, this works fine and it becomes much easier to use. It is of course still inconvenient for people who came from functional programming world. The problem is: (1) there is no universal Closure, that means you have to keep track of the Closure type as type annotation, many, including me would love to ditch the cumbersome type annotation; (2) you have to write the interface for each number of arguments (much better than writing for each argument type of course, but still…).

Of course, OTOH, this is the best you can do while maintaining static checking ability. Here is an example usage:

@Test
public void testClosure1Arg() {
  Closure1Arg intToString =
      new Closure1Arg<String, Integer>() {
    public String invoke(Integer v) {
      return v.toString();
    }
  };

  assertEquals("5", intToString.invoke(5));
}

A sharp reader would have noticed several (not-very-correct?) things happening here. First, we eliminate the generic from intToString type annotation. This does not change the semantics of the code and make the code looks cleaner; however, it does create an “unchecked” warning with the Java compiler. Secondly, we utilize autoboxing freely in this implementation. Since we are utilizing generics, R, A1, …, An must all be objects. This is obviously undesirable and ugly. However, due to autoboxing, this is no longer an issue (except performance-wise).

So far, this is really the best we can do while preserving static type checking. However, if we ignore additional security provided by static type checking, we can go a step further: we can introduce a Closure base class; the rest of the closures interface (Closure1Arg, etc.) will become an abstract class that extends the Closure class. This is where all the interesting stuffs are happening. Here is the source code for the Closure class.

public class Closure<R> {
  @SuppressWarnings("unchecked")
  public R invoke(Object... args) {
    Class[] classes = new Class[args.length];
    // This is okay since generic type annotation
    // is stripped after compilation. We only
    // need to find invoke method with
    // the right number of params.
    Arrays.fill(classes, Object.class);

    try {
      // Search for correct invoke method
      // and execute it.
      Method method =
          this.getClass().getMethod(
              "invoke", classes);
      return (R) method.invoke(this, args);
    } catch (Exception e) {
      // TODO: handle exceptions correctly.
      // ClosureException is a runtime exception.
      throw new ClosureException("Error", e);
    }
  }
}

// e.g. of ClosureNArg abstract class.
public abstract class Closure1Arg<R, A1>
    extends Closure<R> {
  public abstract R invoke(A1 arg);
}

This code completely sacrifices static checking. Now we can simply have each closures to be stored in variables of Closure type. When invoke is called, the method from the base class (above) is called. It will find the correct (i.e. more specialized) invoke method (declared in the concrete anonymous object derived from the subclass) to call based on the type of the arguments to invoke. Here is how we can use it:

@Test
public void testClosure1() {
  Closure<Integer> addOne =
      new Closure1Arg<Integer, Integer>() {
    public Integer invoke(Integer arg) {
      return arg + 1;
    }
  };
  assertEquals(3, (int) addOne.invoke(2));
}

@Test
public void testClosure2() {
  Closure<Double> add =
      new Closure2Args<Double, Double, Double>() {
    public Double invoke(Double a, Double b) {
      return a + b;
    }
  };
  assertEquals(7.5, add.invoke(4.2, 3.3));
}

Sweet! Now this is neat! Of course, we have ignored much of the static checking and replace them with dynamic, runtime checking. I have removed much of the exception checking code in Closure<R> base class to save space and focus on the meat. Again, I’d draw readers’ attention to several important code fragments:

  1. We catch most of the checked exception and replaced them with unchecked runtime exception to make the usage much cleaner, this follows more closely to dynamic, interpreted language rather than static language (as with Java);
  2. We readily suppress warning with @SuppressWarnings (heh!);
  3. The code, arguably, takes a bit of performance hit, so users who are more concerned about performance should probably not use this (squeezing performance usually means messier code; closure is for elegance, not mess);
  4. You may want to write simple script to autogenerate ClosureNArgs (N from 0 to however many arguments you think is reasonable; I personally generate up to 7 arguments);
  5. You probably want to generate closures for void return value as well; that would make things messier (much messier) though. It is probably easier to return null instead.
  6. Here, I opt to go for clean ClosureNArgs subclasses. Others may opt to add some ugliness to the subclasses and make the base class cleaner (and likely result in faster closure code). I will likely experiment with this further, and try to implement variable binding as well.

I found it very interesting to implement Closure in Java; hope you do too. :)

EDIT: This is interesting, wonder how I missed this earlier: Closures for the Java Programming Language (v0.5).

EDIT 2: I simplified Closure&ltR> a little bit more. The simplification is a result of stripping of the generic type annotation after compilation. So the base class need only hunt for invoke method with the right number of parameters.

The teaching of computer science

Tuesday, April 7th, 2009
Kids & Computers

Kids & Computers

Today’s ACM TechNews includes an article on CMU trying to buck trends of having few women in Computer Science. I agree wholeheartedly.

However, before we even get there, computer science is seriously in need of polished publicity. I have talked with several undergraduates and high school students recently and encouraged them to take up computer science. In general, most of them were impressed when they heard that I majored in Computer Science. They thought that it is pretty cool as a major. However, when I asked them whether they are considering (especially the high school students) to major in computer science, the response I got was roughly this: no. “Why not?” They replied back that computer science involve lots of programming and they think that programming is hard, really hard. They also strongly equate computer science with programming and only programming. I do believe that programming is not easy, especially at the beginning, but it is not harder than, say, solving difficult physics questions, or designing chemistry experiment to confirm a hypothesis. (I was lucky to be introduced to programming, with LOGO and Basic—yes, Basic, unfortunately—in late elementary school; but most of these people have never touched any programming language before.) Few of them also think that programming is geeky, something that I would never refute; I think being geeky is cool.

Furthermore, even if programming were taught in schools before university, I have a strong feeling that they are being taught wrongly. Instead of being taught to appreciate the beauty of programming (recursion, abstraction, closure, memoization, etc.), teachers were more focused on getting the syntax right. That, in itself, is not too bad. However, the choice of language really do reveal the weakness of the method. Teaching C++ or Java as a first programming language is definitely not a good idea. These languages are very thick with syntax. Example to the point, I have seen loops being taught as a syntax; as for its concept, teachers keep reiterating that it is used to make a particular code loops over and over until the condition turns to become false. The explanation is correct. On the other hand, such explanation rarely extract the awe and understanding of the power of loop construct.

(Granted, at high school level, I am only familiar with Cambridge ‘A’ Level syllabus for Computing, which uses C/C++, I might have too narrow a viewpoint, or even outdated.)

So the first thing we should straighten out is to clean up the way programming should be taught to students. I would start with Javascript (hey, something that can be applied right away on their own websites is cool and fun!), Python, or Scheme. (Yes, you’re right, I think type system only make teaching programming more convoluted without offering much direct benefits.) Picking good teachers is no longer optional. The aim should be to make students appreciate computer science as a diverse, exciting, and fun field. It should definitely not be aimed to teach students to be proficient in one (or, worse, two) programming language. Probably introducing them at younger age would make it even more effective, as students get higher exposure to it and the curriculum gets more refined as it is harder to teach the younger students. (In fact, it is probably more rational to teach visual programming technique. About half a year ago, I watched a Google Tech Talk on tangible functional programming, where the concept of value, pair, and function/lambda is translated into intuitive visual system that allows even stronger form of composability.)

Certainly, putting some into good publicity will not hurt. Though I do think that we should put more effort into integrating computer science into early education (just like other sciences) to let the younger generation figures out how suitable is the major for themselves, before the stigma attached to CS becomes too deeply rooted in them.

Acknowledgment: Photo was published by shapeshift under Creative Commons (Attribution & Share-Alike); link to flickr page.

Steve Souders’ tech talk on writing fast code

Saturday, April 4th, 2009

This guy is my hero on web performance technique, so I was real happy when this video was published recently in googletechtalks YouTube channel (check out the channel, it has many, many other interesting videos).

In this tech talk, he built up upon the previous tech talk (which I happened to watch live, but could not find it online) and talk about how you can take advantage of parallel Javascript downloads while maintaining any coupling you have between external scripts and inline scripts that you have. One of the techniques involve a modification of John Resig’s degrading script tags. He also mentioned problems with iframe preventing onload events to be fired earlier. Finally, he made clear how you could flush your data early to ensure that you utilize chunked transfer. The last bit was particularly interesting. It has been in my mind recently and I have experimented quite a bit with PHP to perform this. Soon I’m going to try to figure out whether a Python (non-Apache) web server can perform similarly (I believe so). I already wrote a testing web server (based on HttpServer class).

In PHP/Apache case, there were quite a few difficulties in making sure that chunked transfer is performed. One is to make sure that you have enough data (and specifically flush PHP output buffering otherwise). Another involves figuring out Apache’s limitation on sending a single chunk (you have to have enough data before Apache agree to push these data; worse still, if you activate gzipping, the data must exceed the minimum gzip window). Of course, the most interesting problems would be to figure out how to structure the page to ensure that you get the most out of chunked transfer. In another word, you would want to transfer as much dumb data (that can be generated real fast) as possible to utilize the time taken for the server to generate the difficult content as efficiently as possible (at that time, the browser can start downloading CSS, some images, and maybe some Javascript). Also note that the browser will start partial rendering of the page; however, the rest of the page will only be rendered when the rest of the data comes (which can be very long when you’re unlucky). So you need to make sure that the page renders fine with partial data.

Souders’ talk revealed several more considerations that escape my own experiments in this. One is that some proxy server will buffer chunked transfer; the effect of which is that the user will get the entire page in a single monolithic transfer. Also, some browsers (Chrome and Safari) will not start partial render unless there’s enough data (2KB for Chrome, and 1KB for Safari).

Deriving natural laws from raw data

Saturday, April 4th, 2009

So I read this article today. Cornell researchers have successfully implemented a parallel computer system that is able to basically derive physics laws from observation data. This is extremely interesting. Before deciding to take computer science, I was split between majoring in physics or computer science (and, to much lesser extent, mathematics). To see an inter-disciplinary research like this one made me very happy and impressed. In fact, I do believe that many of the more interesting research (current and future) will involve multi-disciplinary twists. Just recently I read a paper describing translation of polynomial (or restricted-polynomial), completely-partitionable, first order differential equations into distributed protocols (that satisfy the same mathematical properties of the derivatives)1.

So, in this deduction system, it basically figures out derivatives (changes in values w.r.t. to some other changes in the system, e.g. time, position, etc.) that describe the observables, and uses this derivatives to derive system invariants. Technically, invariants are not the actual physical equations. However, it encodes all the information needed to derive physical equations for the system. With this technique, Cornell researchers were able to derive conservation of energy and momentum, and also Newton’s second laws from observations taken from simple experiments (including spring-loaded acceleration, simple and double pendulums).

Read the article for more details. Also, I’m definitely waiting to read the Science article (Vol. 323, No. 5924) that was supposed to come out on April 3 (the digital version seems to not have come out though).

  1. Gupta, I. On the Design of Distributed Protocols from Differential Equations. PODC’04. []