General musings on programming languages, and Java.

Saturday, April 26, 2008

So You Like For Loops? Zip It Up!

I've written this little for loop many a time, it finds all duplicates in a sorted list of integers:

List<Integer> nums=Arrays.asList(1,2,2,3,4,6,6,7,8,8);

List<Integer> dups=new ArrayList<Integer>();
int prev=nums.get(0);
for (Integer i: nums.subList(1))
{
    if (prev==i)
        dups.add(i);
    prev=i;
}
Now go and read your email or something, and come back to read just the for loop (and the line above it). You have to trace through time in your head to work out what it does. The code doesn't say what it means, but that's ok, right, because it works.. hmm.

Let's make up a type called Pair<X,Y>, and quickly imagine that List now has this nice method, zip. Here's the above list, zipped with a sublist of it starting from 1.

System.out.println(nums.zip(nums.subList(1)));
output: ArrayList((1,2),(2,2),(2,3),(3,4),(4,6),(6,6),(6,7),(7,8),(8,8));
Can you see how that relates to the original list? I hope so. Given this data, how would you find duplicates? Well, you'd look for any Pairs where the X and Y value are the same, easy. We'll call that zipped list 'zipped', of type List<Pair<Integer, Integer>>:
List<Integer> dups=new ArrayList<Integer>();
for (Pair<Integer, Integer> pair: zipped)
    if (pair.x().equals(pair.y()))
        dups.add(pair.x());
This for loop is a common pattern now, it's a 'collector'. At each iteration we have a/the list of dups, and the current pair. If the current pair has equal elements we modify the list of dups to make it have one more. Really, it seems like it should be a general operation, filter:
List<Integer> dups=zipped.filter(new Predicate<Pair<Integer, Integer>>()
{
    public boolean invoke(Pair<Integer, Integer> pair)
    {
        return pair.x().equals(pair.y());
    }
}).map(new Function<Pair<Integer,Integer>,Integer>()
{
    public Integer invoke(Pair<Integer, Integer> pair)
    {
        return pair.x();
    }
});
It's kind of better but the syntax is getting in the way. Hold up your closure glasses to the screen, or if you don't have any, I've kindly repeated the code but using the proposed Java 7 closures syntax:
List<Integer> dups=zipped.filter( { Pair<Integer, Integer> pair => pair.x().equals(pair.y()) } ).map( { Pair<Integer, Integer> pair => pair.x() } );
I hope that helped you read the previous code. Let's go one step further with these glasses, they are now magically type inference glasses. We don't need to specify the type of 'pair' because it's blindingly obvious from the type of zipped. We don't need to specify the type of dups because it's blindingly obvious from the return type of filter:
val dups=zipped.filter( { pair => pair.x().equals(pair.y()) } ).map( { pair => pair.x() } );
It looks to me like the braces in there are a bit redundant, the () after x and y are just annoying, and .equals is a Java design error that our glasses can correct:
val dups=nums.zip(nums.tail).filter(pair => pair.x==pair.y).map(pair => pair.x)
Und viz zis Scala zee transfurmaschun vill be complete! (parody of a parody)

Update: Thanks to David MacIver, who spotted a mistake, I had to add 'map' to each of the non-for-loop examples, and in updating it, I've stopped short of what might be the Scala norm - in a closure such as (x => x+2), where the closure parameter is only used once, you can write (_+2). So above you'd write blahblahblah.map(_.x) instead of blahblahblah.map(pair => pair.x).

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.