Another Legislative Success

With inotify in Linus's tree, let us take a break from the economic rants and sordid stories of Bostonian fun and look at how to use the beast.

This is going to get technical.

First step is to initialize an inotify instance and associated queue:

    int fd;

    fd = inotify_init ();
    if (fd < 0)
        perror ("inotify_init");

Second step is to watch some objects. Events are generated against registered watches (and their children). Each watch is comprised of a (path,mask) pair and represented by a watch descriptor (wd). The watch mask is a bitmask of one or more events, defined in inotify.h.

Let's add a watch for closes (but only if the file was opened for writing!) and any extended attribute changes to the file coworker_blackmail.txt:

    int wd;

    wd = inotify_add_watch (fd,
                            IN_CLOSE_WRITE | IN_ATTRIB);
    if (wd < 0)
        perror ("inotify_add_watch");

Thousands of watches can be added in a similar manner. The return value--the watch descriptor--is used to tie the event back to the initial object.

Other possible events include read, write, close (but not opened for writing), open, move, create, and delete. The inotify design is extensible, and we can add new events if needed.

For the Samba folks, inotify supports one-shot watches. If IN_ONESHOT is set on the mask during watch addition, the watch is atomically removed after the generation of the first event.

The file descriptor returned by inotify_init() is select()- and poll()-able. When ready, events may be read via read(2):

    char buf[BUF_LEN];    
    int len, i = 0;

    len = read (fd, buf, BUF_LEN);

    while (i < len) {
        struct inotify_event *event = (struct inotify_event *) &buf[i];

        printf ("wd=%d mask=%d cookie=%d len=%d\n",
                event->wd, event->mask, event->cookie, event->len);
        if (event->len)
            printf ("name=%s\n", event->name);

        i += sizeof (struct inotify_event) + event->len;

We read events in this fashion for two reasons. One, inotify allows us to slurp down all available events with a single read request. Two, the size of the structure is dynamic and based on the length of the filename, if any, and associated padding.

For every event, the flag IN_ISDIR is set in the mask if the object in question is a directory.

If the object-in-question's backing filesystem is unmounted, the event IN_UNMOUNT is sent and the watch is automatically removed. Inotify, unlike dnotify, will not pin the mount.

A watch is removed via

    int ret;

    ret = inotify_rm_watch (fd, wd);
    if (ret < 0)
        perror ("inotify_rm_watch");

The size of the queue is obtainable via ioctl(2):

    unsigned int queue_len;
    int ret;

    ret = ioctl (fd, FIONREAD, &queue_len);
    if (ret < 0)
        perror ("ioctl");
        printf ("%d bytes pending in queue\n", queue_len);

The inotify instance is destroyed and cleaned up on close(2):

    int ret;

    ret = close (fd);
    if (ret < 0)
        perror ("close");

Inotify is configurable via sysctl(8) and procfs:

/proc/sys/filesystem/inotify/max_queued_events is the maximum number of events that can be queued at once. If the queue reaches this size, new events are dropped, but the IN_Q_OVERFLOW event is always sent. With a significantly large queue, overflows are rare even if watching many objects, despite what Nat might say.

/proc/sys/filesystem/inotify/max_user_instances is the maximum number of inotify instances that a given user can instantiate.

/proc/sys/filesystem/inotify/max_user_watches is the maximum number of watches per instance.

These knobs exist because kernel memory is a precious resource. Only the system administrator can change them.

Inotify is easy-to-use yet versatile; powerful yet lightweight. I am interested in seeing what applications can do with it. Already, we have Beagle, Gamin, Muine, and--hopefully soon--Samba. What's Next?

SUSE 9.3 users can grab a kernel package with inotify locked and loaded. And both Fedora and our development kernel now come with inotify! Rough riders can grab 2.6.13-rc3-git1.

Pleased to see that Dave Miller has a blog. But you need an RSS feed, duder!