December, 2008

Cron, Network Users, and Launchd

When our users try to set up cron jobs (using crontab -e), they seem to work fine, until the system reboots, at which point the cron jobs stop running. If the user re-edits their cronfile, the cron job will start working again.

The problem is that cron is starting up before NIS is running. When cron starts, it looks at all the files in /usr/lib/cron/tabs (which is actually /var/at/tabs), and loops up the each filename to see if it is the name of a user. If it is, then cron remembers it as a job, but if it is not, cron just skips over it.

Cron will rescan the directory when it changes (e.g. when a user edits their crontab), and at this point if NIS is running, then cron will find all of the jobs it might have lost.

As far as I can tell, there is no longer any way in launchd to tell cron that it shouldn't start running until after directory services are running. Because of this, I suspect there also may be a problem here for LDAP/Open Directory users (and not just NIS), though I haven't tested myself.

Some might be thinking at this point, well why don't you just use launchd? The whole point after all was for launchd to replace cron. Well, if you take the view that cron's only purpose is system automization, then maybe that's ok. But in fact, cron is also an end-user program, which is now broken.

At any rate, I don't even think this will work. User LaunchAgents live in their home directories. And we have network-mounted home directories. Does this mean that my jobs would only run if I was logged in? This is the same problem all over again. You could always "fix" launchd to mount all network homes, and look for jobs, but this would then mean my job would run on every machine where my home directory could be mounted, never mind the ridiculous waste of network resources searching for jobs. And launchd jobs are much harder for an end-user to set up than cron jobs too. Even if launchd could be fixed to somehow properly handle user jobs in a resource friendly way, my users would still prefer cron for most things.


I don't have any fix. For a workaround, I'm working on a script that will run at boot time, and will update the crontab directory once it sees that NIS is up. Ugly stuff.

All Bow to Launchd

The real problem here is launchd. The designers of launchd are very proud of how strictly pedantic they have been in writing launchd. They seem to be very proud that there is in fact no way of setting up the needed dependency in launchd. They would have you believe that it's the application (cron) that has the problem, and not launchd.

What a load of crap. Our job as software designers is not to program our hearts desires, and convince people that our new toy is more beautiful. Our job is to provide the functionality that the customers require and even desire. Launchd lacks an obvious and important feature - explicit dependencies. Cron doesn't work proplery under launchd. And ultimately, cron was there first. Don't believe them when they try to tell you this is a good thing.

But that's just being pragmatic. Ultimately, the "purist" view that launchd proponents are offering is flawed even in the theoretical. According to launchd proponents, cron should "negotiate" it's own dependencies through IPC. As far as I can tell, this means that cron should start up, and then it should sit around watching for directory services to become active. But if the benefit of launchd is that services only run when they need to run, and launchd handles the rest, then why would they want cron to sit there running and watching for an event where NIS comes on line? This makes no sense.

It also makes no sense in terms of clean layering and application design. cron should not have to care about where it's users are coming from. It should use the getpwnam() call, and trust that the system has already done the work behind getpwnam(). Cron has NO BUSINESS watching for directory services. It's absolutely ridiculous to expect cron or any other computer service to actively participate in the system startup process, because this means that services suddenly have to know a whole lot of extra stuff that's really not their business.

Imainge if Apple said that every application should go out and negotiate it's own printing paradigm - find printers, discover the models and features, and communications protocols. No one would actually fall for this, because it's the operating system's job to provide printing to the applications. But this is exactly what they're doing for the resource of orderly booting.

Launchd has some good stuff. Delayed startup of network applications is a really great idea. But that's no excuse for throwing away basic features needed for booting. Launchd needs explicit dependencies. [an error occurred while processing this directive]

More Mac OS X Stuff

Tom Fine's Home
Send Me Email