Wednesday, April 25, 2012

Blocking versus non-blocking sockets

In an operating system sockets are treated like file-handles that you read from and write to. You can read in blocking mode, which is normal, or non-blocking mode. In the code described below I assume Linux is the operating system, but the principles are the same for Windows and BSD, although the function names and constants will be different.

Blocking means that if data is not available for reading or if the device is not ready for writing then the operating system will wait on a request to read from or write to a socket until it either gets or sends the data or times out. In other words the program may halt at that point for quite some time if it can't proceed.

Non-blocking means that the request to read or write on a socket returns immediately whether or not it was successful, in other words, asynchronously. It is the task of the programmer then to decide what to do next: to try again or consider the read/write operation complete. Non-blocking is usually much faster but is a bit more complex to set up and manage.

The process of sending and receiving data over a socket is the same in both the blocking and non-blocking cases. There are five steps:

  1. Creating the socket
  2. Binding it to a local IP-address and port
  3. Connecting it to a remote IP-address and port
  4. Writing data over the connection
  5. Reading the response

Steps 3,4 and 5 may involve sending or receiving packets of data, and so may be performed in either blocking or non-blocking mode.

Creating a socket

In order to send data over an IP connection you have to decide whether the transmission will use IPv4 or IPv6. This will determine the template of the packets that will be sent. A socket is an endpoint of communication. The remote machine to which we will connect also sets up a socket and reads and writes to its remote socket as it listens in for requests from our socket. But we only need one socket to both read and write.

We also have to declare what sort of IP communication we will be carrying out: TCP or UDP. The former needs the 3-way TCP handshake first to establish a 'connection'. UDP does not, so a request to connect on a UDP socket doesn't send any packets. Let's say we want a standard IPv4, TCP socket. Our code will look like this:

#include <stdio.h>
#include <sys/socket.h>
int sock = socket( AF_INET, SOCK_STREAM, 0 );
if ( sock != -1 )
{...}
else
    printf("couldn't create a socket\n");

The constant AF_INET means that we want an IPv4 socket and SOCK_STREAM declares that it should be a TCP connection. The last argument is normally 0 for the 'protocol', which means that the operating system should choose the default. The return value is -1 if it fails, otherwise it will be an integer - usually a small one - which is the identifier of the socket.

Binding to a local IP-address and port

Before it can be used to send IP-packets the socket has to be 'bound' to a local IP-address and local port. The local port usually doesn't matter, but it is put into the IP-header because it is the port to which the remote application will send its replies. Usually we just specify 0 and the operating system will choose a free port for us. More importantly we must choose a valid local IP-address. This can be the default IP-address of some interface such as localhost (127.0.0.1) or that of any other interface, or even an alias of an interface's main address. So a bind call on localhost looks like this:

struct sockaddr_in addr;
addr.sin_family = AF_INET;
/* use a random port as the socket's source port */
addr.sin_port = 0;
/* load the address of localhost as the socket's source address */
int res = inet_pton( AF_INET, "127.0.0.1", &addr.sin_addr );
if ( res != 1 )
    printf("inet_pton error %s\n",strerror(errno));
else
{
    res = bind( sock, (const struct sockaddr *)&addr,sizeof(addr));
    if ( res != -1 )
    {
        printf("bound socket %d to 127.0.0.1 \n",sock);
        ...
    }
    else
    {
        printf("failed to bind to 127.0.0.1\n");
    }
}

The sockaddr_in structure is for IPv4 connections. Note that bind expects a generic struct sockaddr pointer, which could be an IPv6 address. So we have to cast our IPv4 structure to the generic type. We set the IPv4 address in the structure to 127.0.0.1 via a call to inet_pton. This just encodes the four numbers expressed as a string into four integers in network byte order for us.

Connecting

In TCP we have to first establish a connection by sending a SYN packet. The server then replies with SYN-ACK, and the client answers with an ACK. All this is sent via the connect function. If the socket is not already bound explicitly to an ip-address and port (i.e. if we didn't call bind) then connect will bind it for us to the default interface's default IP-address and some random port. Usually we want to control that, though. So the connect call looks like this:

int do_connect( int sock, char *host, char *port )
{
    struct sockaddr_in addr;
    /* clear addr structure first */
    memset( &addr, 0, sizeof(addr) );
    /* reuse addr structure to connect to host and port */
    int res = inet_pton( AF_INET,host,&addr.sin_addr);
    if ( res == 1 )
    {
        /* port number must be in network byte order */
        addr.sin_port = htons(atoi(port));
        /* establish TCP connection via handshake (SYN,SYN-ACK,ACK) */
        res = connect(sock,(const struct sockaddr *)&addr, sizeof(addr));
        if ( res == 0 )
        {
            printf("connected successfully to %s on port %s\n",host,port);
            return 1;
        }
        else
            printf("couldn't connect to %s on port %s\n",host,port);
    }
    else
        printf("inet_pton failed: %s\n",strerror(errno) );
    return 0;
}

Apart from the socket, the two parameters are host, which is the IP-address of the remote server we want to connect to, and port, which is the port we want to connect on. This time 'port' has to be a real port. A random one won't do. The functions htons and atoi just turn the string representation of port into the correct numerical form. So if we wanted to connect to the BBC web-server the value of host would be 212.58.244.66 and the port 80. We reuse the same addr structure, but reset the values to what we want in this case. We return 1 on success and 0 on failure. If successful, our socket is connected and can start sending and receiving data on it.

Sending data

static ssize_t writen( int sock, const void *vptr, size_t n )
{
    size_t nleft;
    ssize_t nwritten;
    const char *ptr;
    ptr = vptr;
    nleft = n;
    while ( nleft > 0 )
    {
        if ((nwritten = write(sock,ptr,nleft)) <= 0 )
        {
            if ( errno == EINTR )
                nwritten = 0;
            else
                return -1;
        }
        nleft -= nwritten;
        ptr += nwritten;
    }
    return n;
}

This function writes an arbitrary amount of data to the socket we connected in the previous step. We may not be able to send all the data in one go, so the writen function keeps looping until it is all sent. The write function also works for files and blocks by default. So if the buffer to write to isn't ready, because the connection is down or slow, then it will wait. The test for the EINTR (interrupt) error continues in case write returns -1 in that case. The function will continue until it has written all the data.

Reading the response

If we sent the server a message like a HTTP GET call, we will want to receive the reply on the same local port we encoded into the packets we sent by the call to writen. So we just call the read function and loop until read returns 0:

static int read_blocking( int sock )
{
    int n,total = 0;
    for ( ; ; )
    {
        n=read( sock, line, MAXLINE );
        if ( n < 0 )
        {
            total = -1;
            printf( "failed to read. err=%s socket=%d\n",
               strerror(errno),sock);
            break;
        }
        else if ( n == 0 )
        {
            // just finished reading
            break;
        }
        else
            total += n;
    }
    return total;
}

line is just a buffer we fill with the response, of length MAXLINE. Note that in this simple function we just throw away the data, and only read it one MAXLINE chunk at a time.

Non-blocking In/out

Three of those calls send data: connect (the TCP handshake), write and read. Each may block. So to make the process non-blocking we have to remember which of those three states we are in so we know what to do next. We start in the connect state, and when that has completed we move to writing, and when that has finished we can move to read. But first we have to change the socket so that it returns immediately on a call to connect, write or read:

int make_nonblocking( int sock )
{
    /* get existing socket flags */
    int flags = fcntl (sock, F_GETFL, 0 );
    /* switch socket to non-blocking mode */
    int res = fcntl( sock, F_SETFL, flags | O_NONBLOCK );
    if ( res == -1 )
    {
        printf("failed to make socket %d non-blocking\n",sock);
        return 0;
    }
    else
        return 1;

Here we use the fctl function (file control) to change the file-handle, aka socket, to non-blocking mode. But first we must get the current state of the socket in case there were other settings. We add the 'non-blocking' flag (O_NONBLOCK) by logically ORing it to the current flags (flags) and the socket's behaviour will be changed. Again, we must remember to test for an error.

Non-blocking connect, write, read

Converting the blocking in/out to non-blocking involves writing a simple finite state machine. For each state we will call try_something to try to complete that state. if it succeeds we move to the next state.

int sendnb( char **argv )
{
    int res;
    int sock = tcp_bind( 0, "127.0.0.1" );
    if ( sock != -1 )
    {
        do
        {
            switch ( state )
            {
                case initial:
                    res = do_connect( sock, argv[1], argv[2] );
                    if ( res )
                        state = writing;
                    break;
                case connecting:
                    res = try_connect( sock );
                    if ( res )
                        state = writing;
                    break;
                case writing:
                    res = try_writen( sock );
                    if ( res )
                        state = reading;
                    break;
                case reading:
                    res = try_readn( sock );
                    if ( res )
                        state = done;
                    break;
            }
        } 
        while ( state != done && state != error );
        close( sock );
        if ( state == done )
            return 1;
    }
    return 0;
}

Let's take them one at a time.

Establishing a connection asynchronously

We call ordinary blocking do_connect, except that, since the socket itself has been made non-blocking, we will probably fail with errno EINPROGRESS. That is normal, and we stay in the connecting state. On subsequent calls we must test if the pending connection was made, and not call do_connect again. This means we call poll on the socket to see if it is ready for writing:

int try_connect( int sock )
{
    struct pollfd fds[1];
    fds[0].fd = sock;
    fds[0].events = POLLWRBAND | POLLOUT;

    int res = poll( fds, 1, POLL_TIMEOUT_MSECS );
    if ( res == 1 )
    {
        return 1;
    }
    else if ( res == -1 )
    {
        state = error;
    }
    return 0;
}

The timeout parameter to poll can be 0 but we set it to 5 milliseconds just so we don't keep calling it over an over. Poll works by asking it to test for readiness of some state, such as being ready for output (POLLOUT).

Writing asynchronously

When writing we must cover the case that not all the writing can be carried out without blocking. Then we return immediately and call poll next time. Otherwise this routine is the same as the blocking I/O.

int try_writen( int sock )
{
  ssize_t nwritten;
 struct pollfd fds[1];
 fds[0].events |= POLLOUT;
 fds[0].events |= POLLWRBAND;
 int res = poll(fds, 1, POLL_TIMEOUT_MSECS);
 if ( res > 0 )
 {
  int nleft = message_len-message_pos;
  while ( nleft > 0 )
  {
      if ((nwritten = write(sock,&message[message_pos],nleft)) <= 0 )
      {
          if ( errno != EINTR || errno != EAGAIN )
     state = error;
          return 0;
      }
      nleft -= nwritten;
      if ( nleft > 0 )
    message_pos += nwritten;
   else
   {
    message_pos = 0;
    return 1;
   }
  }
 }
 else if ( res < 0 )
 {
  printf( "error: %s\n", strerror(errno) );
  state = error;
 }
 return 0;
}

Reading asynchronously

Reading asynchronously is similar to synchronous read, except that we must cover the case where errno is EAGAIN. Then we return immediately as for write and call poll again next time.

int try_readn( int sock )
{
    int n,total = 0;    
    struct pollfd fds[1];
    fds[0].events |= POLLIN;
    fds[0].events |= POLLPRI;
    int res = poll(fds, 1, POLL_TIMEOUT_MSECS);
    if ( res > 0 )
    {
        for ( ; ; )
        {
            n=read( sock, line, MAXLINE );
            if ( n < 0 )
            {
                if ( errno != EINTR || errno != EAGAIN )
                {
                    state = error;
                        printf("error: %s\n",strerror(errno));
                }
                return 0;
            }
            else if ( n == 0 )
            {
                // just finished reading
                break;
            }
            else
                total += n;
        }
    }
    else if ( res < 0 )
    {
        printf("error: %s\n",strerror(errno));
        state = error;
    }
    return total;
}

Now we're done. Here's the complete test code. Enjoy, but no guarantees it works perfectly.

No comments:

Post a Comment