Sunday, July 25, 2010

Dealing With Dashcode, Part 1: Architecture and Basic Configuration

I've been writing a web app for the iPhone recently.  The main tool for this type of web app is Dashcode, which is part of Xcode.  There's a couple of problems with my setup: one of them I've solved, and the other I haven't.  In the hopes that somebody else on the Internets might find this information useful, here ya go.

The solved problem is how to publish this on my own server.  Most of this revolved around my Apache configuration.  Most of this should be pretty obvious to anybody who's dealt with WebDAV, but one problem with authentication wasn't obvious; that's at the end, but first I'll talk about my setup.

My app involves a combination of three major elements: static HTML documents, dynamically generated data, and the web app that's generated by Dashcode.  (The web app is served as a static page by Apache, of course – it's the client that does all the computation there – but it has its own considerations in the Apache configuration.)  My web site, therefore, has three sections.  (All the project names, domain names, etc. in this are replaced with more generic versions.  These are not real URLs.)
  • http://www.piquan.org/myapp/gen/   : Dynamic content, generated by Python scripts.  This is being read by Ajax, so there's some files in here that generate XML, some that generate JSON, and for my own convenience it's nice to have a few static files in here as well.
  • http://www.piquan.org/myapp/htdocs/   : Static content, just a bunch of HTML files.  The web app will sometimes send the user to these files (and out of the app entirely) by setting window.location.
  • http://www.piquan.org/myapp/serv/   : This holds the app generated by Dashcode.  I want Dashcode to be able to blow everything under here away, and replace it.
All of the files are currently in ~/src/myapp, and while I'm just working on the initial version, I want it to be served straight out of there.  While I'm doing development, all of this is under http://www.piquan.org/myapp-devel/ , and I'll move to /myapp/ once I've got a releasable version.

To get the static files to be served is pretty easy.  I just put this in my Apache config (within the VirtualHost section for www.piquan.org):
Alias /myapp-devel/ /home/piquan/src/myapp/
<Directory /home/piquan/src/myapp>
AllowOverride All
Order allow,deny
Allow from 192.168.42.0/24
</Directory>

(The "AllowOverride All" is to make it easier for me to experiment using .htaccess instead of needing to change my Apache config and restart the server. Of course, .htaccess is somewhat limited in some ways, as we'll discuss later.)

Ok, so what next? Well, I need to serve the dynamic content.  Remember that there's both static and dynamic files in there.  Ideally, it should be transparent whether the content is coming from a static file or being dynamically generated.

To generate the dynamic content, I chose to use Python with raw WSGI.  This is a very simple way to write simple dynamic content.  For complex projects, then a web application framework like Django would be more appropriate, but here I'm looking at about 300 lines of code, so Django would be overkill.  Also, remember that here I'm only sending either simple XML or JSON.  Now, WSGI is actually an interface standard (like CGI), not an implementation; the Apache WSGI implementation is mod_wsgi.  I installed this (using FreeBSD's Ports mechanism).

The default for mod_wsgi is to serve data from within the Apache process.  On one hand, this is very efficient.  On the other hand, it's very inconvenient for development, because if you change your code, you have to restart Apache.  There's an easy solution: put the WSGI handlers in a separate group of processes that Apache automatically manages.  When you change your code, mod_wsgi will automatically kill and restart those.  If you don't understand that, then don't worry: the config file additions are quite simple.  I just added to the VirtualHost section:
WSGIDaemonProcess piquan.org display-name=%{GROUP}
WSGIProcessGroup piquan.org
This sets up a simple set of processes to manage all WSGI requests.  In this configuration, the same set manages everything across my entire domain.  (That's not because of "piquan.org" there; that's just an identifier for the process group.  It applies to my entire domain because it's within the VirtualHost section, not within a Directory section.)  Once it's time to release, I'll change the name from "piquan.org" to "myapp" or something, tune the WSGIDaemonProcess line to use the appropriate amount of resources (in terms of processes, timeouts, etc), and move the WSGIProcessGroup to within a Directory section so that it only applies to my program.

The mod_wsgi Quick Configuration Guide says to use WSGIScriptAlias to tell mod_wsgi what directory should be considered WSGI programs.  However, remember that I have a mix of dynamic and static files, so I took a different tack.

To deal with this, I put the static files in files named things like staticdata.js, staticdata.xml.  The dynamic files are named things like dynamicdata.js.wsgi and dynamicdata.js.wsgi.  Then, a bit of .htaccess magic, along with the miracle of MultiViews (which I don't know WHY it's not more widespread) lets me tell Apache to send things the right place.
Options All MultiViews
MultiviewsMatch Handlers Filters
AddType application/json .js
AddType text/xml .xml
AddHandler wsgi-script .wsgi

This is actually a bit of overkill. I specified a very widespread Options line.  The only option I really needed for this was MultiViews.  [Edit: I also need ExecCGI.  Thanks, Graham!]  It's just more convenient during development to have things like Indexes and FollowSymLinks on.  Also, I set the MultiviewsMatch to include Filters, when really I only need it to deal with Handlers.  (I'll probably turn on the mod_deflate filter later, since this is an iPhone web app that will sometimes be sent over 3G and even EDGE networks, but that's not something that MultiviewsMatch needs to be involved in.)  I generally tend to put in pretty broad web capabilities during dev, and then tighten it when I deploy.

The two AddType directives are because I thought one of the libraries I was using was being a bit picky about the Content-Type it gets back.  (I could have named my JSON files with .json instead of .js, since Apache already associates application/json with the .json extension, but the .json extension irks me a bit.)  As it turns out, the library wasn't as picky as I thought (for instance, it would be happy with Apache's default of application/xml, which is arguably more appropriate for this purpose; note that both are valid MIME types for XML), but I left them in anyway. Note that the AddType directives only apply to the static data; they don't apply to WSGI scripts, since those send their own Content-Type header.

(By the way: when you're writing apps like this, you can handle REST-style URLs pretty easily.  For example, http://www.piquan.org/myapp/gen/person/piquan could be handled by a script named gen/person.wsgi, which can look at environ['PATH_INFO'] to read the "/mary" bit.  Note that this is the environ passed to application, not sys.environ.  If you really wanted to get fancy, you can probably add some MultiViews magic along with sections to separate this into person.GET.wsgi, person.PUT.wsgi, person.POST.wsgi, etc.  Hmmm... maybe I'll write a filter module to let that happen easily.)

Finally, we have the Dashcode-generated content.  This is created on my Mac, and I need to send it to my web server.  Hello, WebDAV!  WebDAV is pretty nice to have on your web server if you use a Mac as your desktop: it lets you keep an iCal calendar shared on a website (without paying for MobileMe!), gives you a pretty convenient and WAN-accessible file storage from the Finder, you can publish from iWeb, and so on.  (Ok, blatant advertising done.)

WARNING: Don't configure WebDAV until you have secured your web server!  WebDAV lets people write to your disk.  That's its point.  If you don't have a secure server, then you may find people you don't want writing to your disk.

In fact, don't trust my configs on this.  Read over the docs for mod_dav, and you should have a decent understanding of web server security.  At a bare minimum, read Apache's docs on Security Tips, and also the docs on Authentication, Authorization and Access Control.  You should also know why the latter is insecure.  (Hint: Don't use Basic auth; use Digest instead!)

Ok, now that I've made it clear that I don't want you to open your disk to the whole world, here's a bit of my configuration to allow Dashcode to publish using WebDAV.  I actually opened up a lot more than I needed to: instead of just the area where Dashcode puts its static files, I have it set up to allow WebDAV access to my entire program.  This lets me fiddle with stuff in the Finder if I need to.

Now, here's why that's a really bad idea from a security perspective.  WebDAV lets people write files.  WSGI lets people run files.  That means that with the two together, an attacker could write to, and then execute, a file on MY computer.  Bad news.  Don't do this unless you're satisfied with your security: in particular, and at least put a reasonable Allow clause in place.

Here's the configuration I used.  Again, this is back in the Apache config file, in the VirtualHost section.
Alias /myapp-devel/dav/ /home/piquan/src/myapp/
<Location /myapp-devel/dav/>
Dav On
Options Indexes
Require user piquan
</Location>
And then in .htaccess:
AuthType Digest
AuthName "myapp DAV area"
AuthUserFile /home/piquan/.htdigest
(By the way: everything I put in my .htaccess, you can put in the regular Apache config.  I just prefer to use .htaccess for everything I can, since it lets me change options without restarting Apache.)

Finally, I set up my password file by using the htdigest shell command on my web server:
$ htdigest ~/.htdigest "myapp DAV area" piquan
(I already had a ~/.htdigest, and just was adding a new realm+username+password tuple.  If I didn't have an .htdigest, I would need to specify the -c option.)

Now, the mod_dav docs recommend using completely different URLs for the WebDAV URL and the "regular" serving URL.  That's probably a little bit easier to configure.  As for me, I kinda wanted to keep my entire project under the same top-level directory (myapp-devel), so I made the DAV-enabled section a subtree.

Ok, there's one element left: permissions.  I needed the web server to be able to write to the serv directory (where Dashcode was to write its files), but I also wanted to be able to – as my normal user on the web server – edit those files (to edit the manifest, or for whatever other post-deploy changes I wanted to make).  Since WebDAV accesses files as user www, and I access them as user piquan, I needed to make serv writable by both.  But I didn't want to make it globally writable, either.

For this, I turned to FreeBSD's ACL support.  I'm not going to go into details on how to configure a filesystem or kernel for ACL support; there's guides for that online already. (If you don't like the Handbook's dry style, try the O'Reilly ONLamp article instead.)

I needed the serv directory itself to be writable by www and piquan, and also I needed any files within that either I or WebDAV create to be the same (unless explicitly changed).  This means configuring both an access ACL and a default ACL.
$ mkdir serv
$ setfacl -m u:piquan:rwx,u:www:rwx piquan
# The next command is all on one line.
$ setfacl -d -m u:piquan:rwx,u:www:rwx,u::rwx,g::rx,o::rx,mask::rwx serv
After all this was done, I went back to Dashcode's "Run & Share" section, and added my web server as a new destination. Finally, it was time to publish!

At this point, everything I've talked about so far worked. But soon after that, I started having problems.  I'll tackle each of these in a separate post.

2 comments:

Graham Dumpleton said...

You are also inheriting ExecCGI from the fact that All was set for Options. Thus not just MultiViews. For reference by others, the style of configuration you are using with mod_wsgi is described here.

Piquan said...

Oh, right! Yeah, I forgot that ExecCGI is necessary for mod_wsgi. I'll update the post; thanks!