Home

Scatter Gather I/O

Most programmers learn the classic read and write system calls quite early on. However, there are lesser known system calls that give you an extra degree of flexibility when readingor writing to files. This article will discuss how you can reduce the amount of system calls you perform on Linux/BSDs with the help of scatter/gather I/O.

pread and pwrite

pread and pwrite allow you to write data at any arbitrary offset:

ssize_t pread(int d, void *buf, size_t nbyte, off_t offset);

ssize_t pwrite(int fildes, const void *buf, size_t nbyte, off_t offset);

This means there is no need to have to lseek to a file offset before reading/writing to it; you can do everything in one function call. Note that there is no need to save/restore the file pointer using this approach, because these systems calls do not update the file pointer.

readv and writev

First, we must introduce struct iovec:

struct iovec {
 char *iov_base; /* Base address. */
 size_t iov_len; /* Length. */
};

Each struct iovec represents a place in memory that should be read to/written from.readv and writev take arrays of these slices of memory as input and sequentially read or write them to a file.

ssize_t readv(int d, const struct iovec *iov, int iovcnt);

ssize_t writev(int fildes, const struct iovec *iov, int iovcnt);

By using readv/writev, you remove the need to buffer data in userspace memory or make excessive syscalls, because you can keep all your data in their locations and make one syscall to submit all your read/write requests.

sendfile

Lastly, I would like to mention sendfile. It is very convienent when implementing web servers, because as the name implies, it just dumps a file into a socket with optional headers/trailers. We represent headers and trailers (data we would like to write before/after the file) using struct sf_hdtr:

struct sf_hdtr {
 struct iovec *headers; /* pointer to header iovecs */
 int hdr_cnt; /* number of header iovecs */
 struct iovec *trailers; /* pointer to trailer iovecs */
 int trl_cnt; /* number of trailer iovecs */
};

Then, we can write to our socket using sendfile:

int sendfile(int fd, int s, off_t offset, off_t *len, struct sf_hdtr *hdtr, int flags);

Not only is this simpler than a read/write loop, but it is also more efficient. All the file buffers live in the kernel, so there is no need to make an extra copy for userspace like you need to when you read or write. Quite nice.

Conclusion

A wise man once said "RFTM" (read the fucking manual). This advice is still sound today. Take a stroll through the linux syscall table and you will be surprised at how much functionality is sitting there is, waiting to be used.

Home