Tuesday, July 10, 2007

Noogler, Day 11: Work and Play at Google

I know, it's earlier than I normally send this out, but I really hope to get to bed early. When I got up this morning, I was exhausted. I ended up getting in too late to make it to my first class, and the other two classes today depend on it. It's not something vital to me at the moment, so I'm okay with making it up next week.

That also meant that I had all day to work on some of my other studies. Today, I was studying Sawzall, a log analysis tool we have around Google. After what I said yesterday about how I don't talk about work, I figured I should talk about work a little, in case you're curious about what I'm looking at. (And lest anybody wonder about all the company secrets I'm divulging, I'll note that the following is all in a published paper.)

Sawzall is a programming language developed by Google specifically for analyzing logs and extracting relevant data. There's a LOT of log data, and so they developed a system to distribute the analysis over hundreds or even thousands of computers. Although this could be done using a general-purpose language like Python, using a purpose-built language has a few important advantages. Sawzall is designed around the idea of processing logs. It has a lot of functionality that is useful in this context, such as date-handling functions, easy-to-use regexes, etc.

Now, Perl is also built around the idea of processing text files, but Sawzall has one big advantage over Perl: it's designed for distributed computing. When a Sawzall program is run, it automatically is given to many, many computers. Each computer processes a portion of the logs, and then they work together to combine their results appropriately. However, all of this coordination is done by Sawzall. It doesn't require any special code by the programmer; it's done automatically.

These traits allow you to write efficient programs for log analysis quite easily. For example, a program to compute a frequency distribution of number of queries vs. minute within the week only takes 7 lines. In a few simple benchmarks (Mandelbrot and Fibonacci), run in a non-distributed manner, Sawzall is quite fast: 3.5 times faster than Python, 5 times faster than Ruby, and 4 times faster than Perl.

It also scales incredibly well. Ideally, putting five times as many machines on a job should make the job five times faster. In practice, this isn't ever achieved, since there's some overhead. The overhead is only 2% in Sawzall: putting five times as many machines on a job makes it 4.9 times faster.

Now, most languages are built around an idea of execution similar to Algol's. They don't work too well when a different execution style is needed. (This is why I figure that the current state of the art in multithreaded programming is so lousy.) Generally, languages that are designed to work in an unusual evaluation model are fairly different from most traditional programming languages. For example, programs in Haskell, ML, or Prolog are quite different from those in C, Perl, or Python, and require a different way of thinking about programs.

Now, I've worked with unusual evaluation models; for example, Params, my magnum opus at Juniper, is based around reverse-chaining evaluation. I haven't done much work on distributed computing, though. I did a small research project at SWT, and designed Params v1 and v2 so they each could later be migrated to a distributed model (e.g., Beowulf) without completely redesigning the core algorithm.

Sawzall is, as alternative model languages go, pretty straightforward. But it's still a bit more work to think about. As I've been working to understand how the language works, I've had to think carefully about what the implications of the distributed model are. This is the area where I had the questions I mentioned on day 9, when I got referred from one person to the next. I'm a little bit surprised that the others I spoke with hadn't looked into these issues themselves. On the other hand, I'm a bit more of a programming language geek than most programmers. I enjoy thinking about the ins and outs of programming languages.

Now that I've gone through all that, you might understand why I don't talk about what I'm working on much. So let's talk about some more fun stuff, and more of my narratives.

As I said, I was pretty tired today. After just a couple of hours of working, I was having trouble concentrating on anything of significant complexity. I decided to take a little walk, see if that would help wake me up.

After a little bit of walking around, I decided it wasn't helping, and a nap was in order. But where? I walked into the nearest building and wandered until I found a couch. I put Pachelbel's Canon on the iPod, and relaxed.

It wasn't much of a rest. The couch had a pretty hard frame. It would be fine to sit on and talk, but it wasn't much of a bed. Fortunately, a friend I'd seen around campus happened by, and gave me another suggestion. Just upstairs, there were some small conference rooms; little more than phone booths. (My understanding is that they're usually used either for phone calls or interviews.) Two of them have the huge 6' beanbags I mentioned on day 2. Both of those were occupied, but there was another such beanbag (this one 8') nearby. We moved it into an empty room (have you ever tried to get an 8' beanbag into a phone booth-sized room?), where I was able to rest to the strains of Schubert's Piano Sonata No. 20 in A, D. 959: IV. Rondo (Allegretto). Despite the overly complex name, this simple song is possibly the most relaxing music I know. (A portion of this was used as the credits music for the 80's sitcom, Wings.) (The enclosed photo is of a 6' bag, taken later when the room was unoccupied.)

After that, I went on to lunch, and enjoyed some orange chicken with jasmine rice and a small green onion cake. I will say, despite the wonderful food at Google, the chopsticks are pretty lousy. If I find myself eating Asian often, I might keep my own set at work. While I ate, I considered that although I'm a pretty finicky eater, I don't ever have problems finding food that I like at Google. I'm a bit surprised by this.

Back to the desk, back to work for a little while. Still sleepy, and my knee hurt. After some time at this, I went to the pharmacy nearby. Now, if you have time, you might look at the route Google Maps gives to get from Google to the Walgreens. I ask you, how could I have taken a wrong turn?

http://maps.google.com/maps?f=d&hl=en&geocode=9430998064533961443,37.403936,-122.096970&saddr=1600+Amphitheatre+Parkway,+Mountain+View,+CA+94043+(Google)&daddr=112+N+Rengstorff+Ave,+Mountain+View,+CA+94043+(Walgreens)&sll=37.41393,-122.090945&sspn=0.03395,0.058365&ie=UTF8&z=14&om=1

Nevertheless, I did, but it was easily rectified. I spent a little bit of time talking with the pharmacist about my sleeping problems, and got what I think is pretty good advice. (I'll still need to order a PSG, but this should help deal with the issue temporarily.) I also picked up another reusable icepack with a knee strap, so I can have one at work and another at home. Finally, I got a couple of chemical (instant) icepacks and an Ace bandage, so I can apply an icepack if I need to when I don't have one frozen.

As I went to Google (about 16:00), I saw that the sky had suddenly gotten overcast. When I got back, Karl pointed it out, saying, "Is this still California?"

The overcast skies didn't keep people from enjoying themselves outdoors, though. Volleyballers played volleyball, jugglers juggled, dancers danced. The dancers were what originally caught my eye— or rather, the music caught my ear. Bongos and guitar can do that.

2 comments:

Aaron Denney said...

ML has a perfectly standard imperative execution model.

Piquan said...

@Aaron: You're right, of course. I guess I was unclear in my phrasing, which is unfortunate, since this really does reflect the entire point of the article.

I didn't mean to imply that ML plays unusual execution model tricks like Haskell's lazy evaluation or Prolog's backtracking when I listed ML alongside those languages. I simply meant that ML (by which I was mostly thinking of SML, not things like OCaml) falls closer to the "functional language" style of thought (even though it does allow side effects) than the more traditional languages like C and Pascal.

To put it differently, a programmer who only knows C would learn a lot about programming simply by learning ML. I mean this in the same spirit as Perlis when he said, "A language that doesn't affect the way you think about programming, is not worth knowing," or any number of programmers when they've all said that it is possible to write Fortran in any language. In other words, I'm not speaking so much about the model, but rather about the mindset.

At least, that's what I meant to say. However, I don't have much experience at all with ML, so I could be completely off base here! I hope my point wasn't lost by my mistake.