Eric Radman : a Journal

Handling TCP Connections with Kqueue Event Notification

Protocol-Independent Binding

More than ten years after IPv6 was standardized very few corporate systems have adopted it, but the programming interfaces that emerged to support it have already benefited system programmers because the interfaces themselves are protocol agnostic. getaddrinfo(3) provides a mechanism for binding sockets to addresses specified in their native, numeric format, or by a hostname that is resolved according the order specified in /etc/resolv.conf.

To start, establish a addrinfo structure with some data about the kind of connection you're trying to make and a pointer to an array of results that the OS is going to give us. *ai will refer to an array because a hostname can refer to more than one protocol family (IPv6 is sorted first). If we actually wanted to listen on more than one address we would have to loop through the results by following the *ai_next pointer.

#include <netdb.h>
#include <sys/socket.h>

int error;
struct addrinfo *ai0;
struct addrinfo hints;

memset(&hints, 0, sizeof hints); /* zero out structure */
hints.ai_family = PF_UNSPEC;     /* we don't care what protocol is used */
hints.ai_flags = AI_PASSIVE;     /* to be used by bind(3), not connect(3) */
hints.ai_socktype = SOCK_STREAM; /* TCP connection */

/* Listen on any address */
getaddrinfo (NULL, listen_port, &hints, &ai0);
if (error)
    errx(1, "getaddrinfo listen failed: %s", gai_strerror(error));

If no error is reported the structure ai can then be used to set up socket(2), bind(2), and then listen(2).

struct addrinfo *ai = ai0;
int listenfd;

listenfd =
    socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol); // -1 on error
bind(listenfd, ai->ai_addr, ai->ai_addrlen;                  // -1 on error
listen(listenfd, LISTEN_BACKLOG);                            // < 0 on error

Reacting to Events

The primary goal of kqueue(2) was to create an scalable way to monitor a large number of file descriptors, but kqueue is also a generic, deterministic event notification mechanism. First initialize a queue, then add filters that indicate when an event is to hold true.

#include <sys/event.h>

struct kevent chlist, evlist[LISTEN_QUEUE];
int kq, nev, i;

struct sockaddr_storage addr;
socklen_t len = sizeof(addr);
int socketfd;

kq = kqueue() // -1 on error

EV_SET(&chlist, listenfd, EVFILT_READ, EV_ADD, 0, 0, 0);
(void) kevent(kq, &chlist, 1, (void *)0, 0, (struct timespec*)0);

EV_SET is a macro that simply fills in the kevent structure. The call to kevent(2) indicates that there is one change to kq: the addition of EVFILT_READ. The following loop waits for events and then uses any combination of meaninful conditions to determine what the event is, and what should be done about it.

int nconn = 0;

for (;;) {
    nev = kevent(kq, (void *)0, 0, evlist, LISTEN_QUEUE, (void *)0);

    for (i = 0; i < nev; i++) {
        if (evlist[i].ident == listenfd) {
            // It's the FD we ran listen() on, it must be an incomming connection
            socketfd =
                accept(evlist[i].ident, (struct sockaddr *)&addr, &len);
            EV_SET(&chlist, socketfd, EVFILT_READ, EV_ADD, 0, 0, 0);
            (void) kevent(kq, &chlist, 1, (void *)0, 0, (struct timespec*)0);
            nconn++;
        }
        else if (evlist[i].flags & EV_EOF) {
            // connection closed...call close(2)
            nconn--;
        }
        else if (evlist[i].flags & EVFILT_READ)
            // Data waiting...call read(2)
        else
            // Unhandled kqueue() event
    }
}

All of the error-checking code has been removed for this example, but if you may want to add some to recognize a singal by testing errno so that you don't try to process them as a kevent() failure:

#include <errno.h>

nev = kevent(kq, (void *)0, 0, evlist, LISTEN_QUEUE, (void *)0);
if ((nev == -1) && (errno != EINTR)) {
   logmsg(LOG_CRIT, "kevent() returned negative status");
   // clean up
}

Binding and Reacting on Multiple Address Families

Since getaddrinfo() resolves hostnames as well as numeric addresses it can return a linked list of results. This is handy, because you can use name resolution to determine what services listen on multiple addresses. Start off by looping through the results:

struct addrinfo *ai0, *ai;
struct addrinfo hints;
int s[MAXSOCK];
int nsock;

nsock = 0;
memset(&s, 0, MAXSOCK);
for (ai = ai0; ai && nsock < MAXSOCK; ai = ai->ai_next) {
    if((s[nsock] = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) < 0)
        continue;
    if(bind(s[nsock], ai->ai_addr, ai->ai_addrlen) < 0)
        continue;
    (void) listen(s[nsock], LISTEN_BACKLOG);
    nsock++;
}

The kevent structure is very helpful here because the last field is a user-defined typeless pointer. When we call kevent() once for each descriptor created by listen() this field can be used to identify the event as a descriptor by whatever value *ai0 has.

for (i=0; i<nsock; i++) {
    EV_SET(&chlist, s[i], EVFILT_READ, EV_ADD, 0, 0, (void *)ai0);
    (void) kevent(kq, &chlist, 1, (void *)0, 0, (struct timespec*)0);
}

It doesn't matter what the pointer refers to in this case, I'm just using the 32- or 64-bit address as a unique identifier. The event loop doesn't look much different, it just uses a different equality test.

for (;;) {
    nev = kevent(kq, (void *)0, 0, evlist, LISTEN_QUEUE, (void *)0);
    for (i = 0; i < nev; i++) {
        if (evlist[i].udata == ai0) {
            // Connection on FD created by listen() ... call accept()
        }
}

Now calling getaddrinfo() with eradman.com as it's first parameter causes the program to listen on on both families!

$ netstat -an | grep 8080
tcp        0      0  127.0.0.1.8080         *.*                    LISTEN
tcp6       0      0  ::1.8080               *.*                    LISTEN

Cleanup

According to the man page, it's not nessesary to explicitly delete kqueue filters, because "calling close() on a file descriptor will remove any kevents that reference the descriptor." It is proper to free the linked-list created by getaddrinfo(3)

freeaddrinfo(ai0);

For a listening socket you may, but are not required to to close every descriptor created by socket(3). Every UNIX-like operating system will close sockets when a program exits.

References

Kqueue: A generic and scalable event notification facility by Jonathan Lemon

Questions regarding both Clients and Servers (TCP/SOCK_STREAM)

Scalable Network Programming by Felix von Leitner

$ Tue Jan 27 09:25:15 -0500 2009 $