Wednesday, December 31, 2008

What I'm working on at Juniper

I haven't had much time to update my blog for a while.  It's mostly been work, so I think I'll tell you all about it.

I'm loving my job at Juniper, although I must say it's keeping me very busy lately.  The project I'm on is pretty close to the wire.  In the weeks approaching the Christmas holidays, I was spending every waking minute working on my code to get it ready.  That left me able to go to Christmas and enjoy it without worrying about work (more about my vacation in another post, but the basic summary is, it was terrific!).  Even though my part is working great, there's some other parts to the project that are a bit tense, so I'm helping where I can.  Since I got back from vacation, though, things have been a bit more relaxed for me.

Anyway, I can't go into all the background on the project, but I can tell you about my part.  (This gets a bit technical from here on in, so you can skip it if you're not a computer geek.)  The goal is to make it possible to compile with source code on NFS servers and object code on local disk.  The source then can be edited remotely on workstations, and easily backed up; the objects don't take up expensive NetApp space, and the compile proceeds at local speeds.  This mechanism speeds up compiles by a factor of 2-3.

The core of this is a new filesystem layer that I wrote, lcachefs.  This is a type of loopback filesystem, similar to nullfs or unionfs.  lcachefs does some magic mirroring involving three directories.  As with any filesystem, one of these is the mountpoint: where lcachefs shows its results.  Unlike most filesystems, though, it involves two sources: one is known as the source directory (which holds the NFS-mounted sources), and the other is a storage directory (local disk).

Ok, let's pretend that you first copy your sources from a source directory to the storage directory.  Then you loopback-mount the storage directory over your original source directory.  Then, you compile there.  (You mount it over your original source directory so that the filenames in the debugging information point to the original sources, instead of to a temporary storage directory.)  This looks, to the compiler, just like a regular compile.  However, instead of doing the compile on NFS, you're doing it on local disk.

This has two advantages.  The one that's easy to explain is that you're storing your objects and products on local disk (which is cheap), instead of on NetApps (which, despite all their greatness, are expensive).  Objects can easily be recreated, and take up most of the space in a build tree.

The other part is that you're working entirely on local disk.  This is much, MUCH faster than using NFS.  It's not because of the bulk data transfer (i.e., reads and writes), so higher bandwidth doesn't help.  The problem is with the per-request overhead.  There are a LOT of requests going across the wire in a compile.

Let's consider what happens when you build a file.  Specifically, we'll talk about if you're just building hello.o.  Make will look for Makefile, makefile, BSDmakefile, and .depend.  Then it will look at hello.c and hello.o.  (If it needs to, it may look for hello.s, hello.S, hello.f, and whatever other potential sources you may have.)  It'll look up hello.c.gch (if you have precompiled header support), hello.gcda (if you have coverage support).  Then it has to look for stdio.h.  Well, let's suppose that you have two directories in your -I path; let's call them foo and bar.  Now it needs to check for foo/stdio.h.gch, foo/stdio.h, bar/stdio.h.gch, and bar/stdio.h, all of which are in your source repository (hence usually on NFS) before it can go on to look at /usr/include/stdio.h.  Well, the first thing that stdio.h does (on FreeBSD; your OS may vary) is to include sys/cdefs.h.  You got it... another four lookups in the source repository to find sys/cdefs.h!  Repeat for sys/_null.h, sys/_types.h, and machine/_types.h.  In the end, just to compile hello.o, you've done 39 file lookups into your source directory.  Of those lookups, 21 are for .h files... and hello.c only includes one file, stdio.h!  The average .c file (using the FreeBSD source base as a sample) includes 9 files directly.

That's a lot of work.  These lookups are all blocking I/O, so it's one round trip each; from watching network traces, I'm actually very impressed with how fast NetApps can respond, but it's still a non-zero time.  Moreover, though, you've got to look at the CPU usage.  NFS uses RPC, and the overhead isn't cheap.  In my testing, using NFS to compile something requires four times as much kernel processing-- which works out to twice as much CPU overall-- as using local disk!  (That's all in the kernel, too, and that means that preemption isn't as easy, so scheduling takes a little hit... and don't forget the new network interrupts!)

Finally, NFS is designed around concurrent use.  You can't assume that the contents of an NFS-mounted directory will be the same now as they were 90 seconds ago.  While local disk can use metadata caches very, very effectively (and FreeBSD's filesystems do), when you're using NFS, you have to expire caches a lot.  (There are also some things that FreeBSD could do better when it comes to its NFS client, but I fixed those and only got a 5% boost.)

Ok, now you should be convinced that the speed hit for NFS is bad, and that building on local disk is much, much, faster.  Let's go back to talking about lcachefs.

Earlier, I asked you to think about copying your sources to local disk, and doing the compile there.  That's exactly what lcachefs does, but with a twist: it does this lazily, by which I mean it won't copy files until they're needed.

When you first mount an lcachefs directory, it scans just the top-level of your source directory.  It then creates stub files for each entry in the local storage.  This is just an empty file with a special uid/gid pair that marks it as a stub.  If you ls -l that file, then the filesystem will report the correct uid/gid, size, link count, etc, but in reality the locally-stored file is empty.

When you actually try to read (or otherwise use) a file or directory that's a stub, then it will copy it over to local storage before the system call returns to userland.  In the case of a file, this means it copies the contents and sets the uid/gid to its real, non-stub values.  In the case of a directory, it scans the directory and populates the stubs just like I described in the previous paragraph.

That's the basic overview.  The reality is quite complicated because of hard links, the need to watch out for vnode lock order reversal, and other icky stuff like that.  You can see all the gory details, though, once we open-source lcachefs.  I'm currently awaiting clearance from legal, but I hope to either contribute it to FreeBSD, or release it as an independent open-source project.

Sunday, December 07, 2008

Good Friends

I was thinking that I should make a post today, and was wondering what.  I was trying to think of cheery topics, and looked at my blog to get ideas.  Then the Google ads were one for losing fat, and one for help with alcohol & depression.  Boy, how cheery can you get?

Actually, quite so.  In the last several weeks, I've gotten in touch with a lot of friends who are very dear to me.  People I haven't heard from in years and years.  Some of them contacted me, I contacted some of them.  Most of this has been over Facebook or MySpace, one was on this blog which led to some emailing (and there'll be more of that, I promise!), one I got to visit in person while I was in Texas (rock on!), others I've only spoken with on the phone.  But it really seems like the stars have aligned for reopening old friendships!

Really, think of everybody who's ever been your life who has their own big mansion in your heart.  Then imagine getting to visit with each of them, over the span of a couple of weeks.  It's been an incredibly special time for me lately.

I'm gonna be grinning like a fool for a long time now.

Thursday, December 04, 2008

Not what I'm doing at Juniper

I don't often make technical blog posts (ok, lately I don't often make posts), because two of the three people who read my blog don't really care.  But sometimes it's nice to be able to brag about what I'm doing at work.  Besides, last post I promised to talk about what I'm doing.

Since then, I haven't made any posts, because I didn't have the time to say what I was doing at work.  Which is stupid, really.  You three people don't read my blog because it's got juicy technical details; that's for the rest of the masses, and they don't care what I might have said I was going to post about until I post it.  I've had other things to write about, but since I said I'd write about work, I didn't want to write about anything else.  I several times thought, "Ooh, I should blog about this!" and then thought, "No, I said my next post would be about my work at Juniper."

Well, I am hereby renouncing that commitment.  What's more, I am advising you three (tonight's loyal fans) to take anything I might tease about as merely speculation on future topics, and not be sure I won't write about something completely different first.

So there.  I've just freed myself to write about other stuff.  I don't really know why I felt it necessary to maintain an obligation that nobody cares about; why would anybody care if I put some posts in between the previous post (when I said I'd talk about my Juniper work) and the future post when I actually do, is totally beyond me.  I just did feel that it's necessary, so now I can go on with more regular posts.

Except it's late and I'm going to bed instead of subjecting you to more of my incredibly convoluted late-night grammar.