Archive for the ‘Java’ Category

Largest value sub-array [part 1]

Wednesday, November 11th, 2009

I was teaching a bunch of freshman the other day. One of the questions that came up was to determine the largest value sub-array given an array of integers. Obviously, the value is the array can be negative. There are two variants of the questions, one to find the actual maximum value when each of the members of the sub-array are added together, and the second variant is to actually find the start and ending location of the sub-array.

We will first consider the first variant to arrive at optimum answer.

A simple answer would be to perform brute force search in the array, considering each position as possible starting and ending point. Time complexity: O(n^2). Not very good eh? Can we improve on this? Yes, we can if we use dynamic programming. For many of us, the algorithm came up as obvious, but I realize that it is really not that simple for students new to programming to come up with a correct algorithm. To answer this, I tried to explain the concept behind the algorithm: for each element of the array, we find the maximum value sub-array that ends at that element. This value can either be the value of the element itself (meaning starting at the element and ending at the particular element) or the maximum value sub-array ending at the previous element added to the value of the element. (The latter only occurs if the maximum value sub-array at previous location is not negative.) There you go, we have a recurrence relation: given an array A, let v(k) be the maximum value of any sub-array ending at position k, where k ranging from 0 inclusive to N exclusive, N is the length of the array. Hence we have:

v(0) = A[0]
v(k) = max{v(k-1) + A[k], A[k]} if k > 0

The maximum value will be given by max{v_i} for i from 0 to N-1. To implement this in Java, we make several optimization. Firstly, we can opt for recursive method or dynamic programming. Both will take O(n) time but since we’re using Java, we would rather use dynamic programming since there is no tail call optimization in Java. Then, rather than finding the maximum value using max{v_i} at the end, we keep a running maximum throughout the running of the algorithm and return it. We also note that v(k-1) + A[k] will only be greater than A[k] iff v(k-1) is greater than 0. Here is how the code goes (this is a rather faithful implementation, it uses the same variable names as the recurrence relation above to make it easier to see):

public static int calculateMaxValue(int[] A) {
  int[] v = new int[A.length];
  int max = v[0] = A[0];
  for (int i = 1; i < A.length; ++i) {
    v[i] = A[i];
    if (v[i-1] > 0) {
      v[i] += v[i-1];
    }

    if (v[i] > max) {
      max = v[i];
    }
  }
  return max;
}

This algorithm has O(n) time complexity.

What else can we do to optimize this code? Well, we don’t really need to keep an array of maximum values. We notice that at any point of time, to calculate the value of v(k), we only need the value of A[k] and v(k-1). We just turn an O(n) space requirement to O(1).

public static int calculateMaxValue(int[] A) {
  int max = A[0], current = A[0];
  for (int i = 1; i < A.length; ++i) {
    if (current < 0) current = 0;
    current += A[i];

    if (current > max) max = current;
  }
  return max;
}

Voila! We shall continue with the second variant next time. (:

Random sample from a long linked list

Thursday, November 5th, 2009

Sorry for the long no-update period. I had quite a few posts that I wrote half way in my posting queues but haven’t got the time to finish them since I was very busy with work and research. Plus I’m currently learning basic German and that really sucks out a lot of my free time. (Though German is really, really fun; I really hope that one day, I can actually work—maybe temporarily, maybe permanently— in Germany.)

Anyway, a friend of mine is currently applying for jobs in several “big” companies (you know, the Google-Microsoft-Apple kind?). So he was preparing for his interview by hunting quite a few online questions. One of the questions he asked reminded me of a question I used to use to exemplify the kind of questions these companies used in their technical interview:

Given a head pointer to a (possibly very long) linked list of unknown length, devise a method to sample m members at random. Ensure that the method yields equal probability for each element in the linked list to be picked.

Obviously we can traverse the list once to find the length, generate m random numbers from 1 to length of the list and traverse the list another time to pick up these items. A common question would be to prevent you from going through the list more than once. Say the list is very long, the disk I/O you need to get the items from harddisk (even after making sure the items are arranged well that it minimizes cache misses) would make traversal a very expensive operation and hence should be minimized.

The solution I have is inspired by the fact that if the linked list contained m or less elements, I have no choice but to pick all m of them. Hence, the main idea of the solution would be to pick the first m elements and keeping track of k—the number of elements we have seen so far. When we encounter a new element, the new element has an m/k chance of being picked. If it is picked, we dropped 1 of the previously selected m elements at random. We can easily prove that this indeed yields the correct probability via mathematical induction.

Now of course it won’t be fun to implement this in Java or C++, so I decided to use Scheme to code this, just for the fun of it. The solution is surprisingly short and pretty. Here goes:

(define (sample-m lst m)
  (define (helper curr k m-vec)
    (if (empty? curr)
        (if (< k m)
            lst
            (vector->list m-vec))
        (let ((pos (random k)))
          (cond ((<= k m)
                 (vector-set! m-vec
                              (- k 1)
                              (car curr)))
                ((< pos m)
                 (vector-set! m-vec
                              pos
                              (car curr))))
          (helper (cdr curr) (+ k 1) m-vec))))
  (helper lst 1 (make-vector m)))

;;; Let's test this!
(sample-m (build-list 1000 values) 7)

EDIT: I decided to try writing a version for Java as well. Here is the Java code:

public class LinkedListSampler {
  private static Random rand = new Random();
  public static <T> List<T> sample(
      LinkedList<T> list, int m) {
    ArrayList<T> samples = new ArrayList<T>(m);
    int k = 0;
    for (T element : list) {
      int pos = k++;
      if (pos < m) {
        samples.add(element);
      } else if ((pos = rand.nextInt(k)) < m) {
        samples.set(pos, element);
      }
    }
    return samples;
  }
}

EDIT 2: Btw, seriously, the Java code is just for fun. Just imagine that LinkedList does not have a size method, will ya? ;)

Closure in Java

Monday, April 20th, 2009

I believe Closure should be part of any modern language, so I was delighted to read about Closure for Java (currently a JSR draft I believe) from here. Currently, Closure is not implemented in Java and is impossible to implement in its natural form. We can attempt to implement closures using anonymous objects, but that comes with two main issue: (1) it can not access non-final variables defined in its enclosing code; (2) you have to define closure interface for every possible scenario. We really can’t do anything with the first problem; but, as we see later, we can simplify the syntax considerably. Here is an example:

public interface ClosureIntReturnInt {
  public int invoke(int arg1);
}

@Test
public void testNaiveClosure() {
  final int a = 1; // This must be final!
  ClosureIntReturnInt addA =
      new ClosureIntReturnInt() {
    public int invoke(int v) {
      return v + a;
    }
  };

  assertEquals(3, addA.invoke(2));
}

This is obviously not a very good solution. So my first instinct would be to genericize the closure interface. Sure enough, it simplifies the problem by quite a bit. Here is my genericize Closure interface:

public interface Closure0Arg<R> {
  public R invoke();
}
public interface Closure1Arg<R, A1> {
  public R invoke(A1 arg1);
}
public interface Closure2Args<R, A1, A2> {
  public R invoke(A1 arg1, A2 arg2);
}
.
.
.

Sure enough, this works fine and it becomes much easier to use. It is of course still inconvenient for people who came from functional programming world. The problem is: (1) there is no universal Closure, that means you have to keep track of the Closure type as type annotation, many, including me would love to ditch the cumbersome type annotation; (2) you have to write the interface for each number of arguments (much better than writing for each argument type of course, but still…).

Of course, OTOH, this is the best you can do while maintaining static checking ability. Here is an example usage:

@Test
public void testClosure1Arg() {
  Closure1Arg intToString =
      new Closure1Arg<String, Integer>() {
    public String invoke(Integer v) {
      return v.toString();
    }
  };

  assertEquals("5", intToString.invoke(5));
}

A sharp reader would have noticed several (not-very-correct?) things happening here. First, we eliminate the generic from intToString type annotation. This does not change the semantics of the code and make the code looks cleaner; however, it does create an “unchecked” warning with the Java compiler. Secondly, we utilize autoboxing freely in this implementation. Since we are utilizing generics, R, A1, …, An must all be objects. This is obviously undesirable and ugly. However, due to autoboxing, this is no longer an issue (except performance-wise).

So far, this is really the best we can do while preserving static type checking. However, if we ignore additional security provided by static type checking, we can go a step further: we can introduce a Closure base class; the rest of the closures interface (Closure1Arg, etc.) will become an abstract class that extends the Closure class. This is where all the interesting stuffs are happening. Here is the source code for the Closure class.

public class Closure<R> {
  @SuppressWarnings("unchecked")
  public R invoke(Object... args) {
    Class[] classes = new Class[args.length];
    // This is okay since generic type annotation
    // is stripped after compilation. We only
    // need to find invoke method with
    // the right number of params.
    Arrays.fill(classes, Object.class);

    try {
      // Search for correct invoke method
      // and execute it.
      Method method =
          this.getClass().getMethod(
              "invoke", classes);
      return (R) method.invoke(this, args);
    } catch (Exception e) {
      // TODO: handle exceptions correctly.
      // ClosureException is a runtime exception.
      throw new ClosureException("Error", e);
    }
  }
}

// e.g. of ClosureNArg abstract class.
public abstract class Closure1Arg<R, A1>
    extends Closure<R> {
  public abstract R invoke(A1 arg);
}

This code completely sacrifices static checking. Now we can simply have each closures to be stored in variables of Closure type. When invoke is called, the method from the base class (above) is called. It will find the correct (i.e. more specialized) invoke method (declared in the concrete anonymous object derived from the subclass) to call based on the type of the arguments to invoke. Here is how we can use it:

@Test
public void testClosure1() {
  Closure<Integer> addOne =
      new Closure1Arg<Integer, Integer>() {
    public Integer invoke(Integer arg) {
      return arg + 1;
    }
  };
  assertEquals(3, (int) addOne.invoke(2));
}

@Test
public void testClosure2() {
  Closure<Double> add =
      new Closure2Args<Double, Double, Double>() {
    public Double invoke(Double a, Double b) {
      return a + b;
    }
  };
  assertEquals(7.5, add.invoke(4.2, 3.3));
}

Sweet! Now this is neat! Of course, we have ignored much of the static checking and replace them with dynamic, runtime checking. I have removed much of the exception checking code in Closure<R> base class to save space and focus on the meat. Again, I’d draw readers’ attention to several important code fragments:

  1. We catch most of the checked exception and replaced them with unchecked runtime exception to make the usage much cleaner, this follows more closely to dynamic, interpreted language rather than static language (as with Java);
  2. We readily suppress warning with @SuppressWarnings (heh!);
  3. The code, arguably, takes a bit of performance hit, so users who are more concerned about performance should probably not use this (squeezing performance usually means messier code; closure is for elegance, not mess);
  4. You may want to write simple script to autogenerate ClosureNArgs (N from 0 to however many arguments you think is reasonable; I personally generate up to 7 arguments);
  5. You probably want to generate closures for void return value as well; that would make things messier (much messier) though. It is probably easier to return null instead.
  6. Here, I opt to go for clean ClosureNArgs subclasses. Others may opt to add some ugliness to the subclasses and make the base class cleaner (and likely result in faster closure code). I will likely experiment with this further, and try to implement variable binding as well.

I found it very interesting to implement Closure in Java; hope you do too. :)

EDIT: This is interesting, wonder how I missed this earlier: Closures for the Java Programming Language (v0.5).

EDIT 2: I simplified Closure&ltR> a little bit more. The simplification is a result of stripping of the generic type annotation after compilation. So the base class need only hunt for invoke method with the right number of parameters.