Linux IP Stacks Commentary Web Edition

Sockets API Overview

[Ed. Note: the information in this section needs to be updated to the most recent kernel release.]

Table of Contents

Introduction

Major Socket Functions

Socket Option Functions

Socket Options (SOL_SOCKET)

IP Standard Options (IPPROTO_IP)

IP Multicast Options (IPPROTO_IP)

TCP Options (IPPROTO_TCP)



Introduction

In traditional programming — a database query application, for example — the user tells the program what he or she wants done. The application then follows those orders, library routines, subroutines, and operating system functions to do the user’s bidding. In this context, a user’s command is like a king’s edict, which gets bounced down the power ladder to the poor serf at the bottom.

In contrast, network programming relies on cooperation. Users may think they’re issuing commands, but the programs in the underlying application code in the networked computers cooperate with each other to get the job done. Chat programs, which enable two users to type messages to each other in real time, are a good example of cooperative programming.

To understand the code in the Linux Transmission Control Protocol/Internet Protocol (TCP/IP) stack, you first need to know how applications interact with the Linux kernel. Assume, for example, that you’re using a database program on a network. First, you enter a request through the application running on your computer, and then your computer cooperates with another computer, linked via the network, on which the database actually resides.

The API (application programming interface) for the network portion of an application suite provides the tools that let the “client” system and the “server” system work hand in hand. As with most operating services, the underlying implementation code for the API and the server does much of the grunt work for the application suite.

The information in this section is a “cheat sheet” for the API. It is not intended as a reference for writing network programs for Linux. Aspiring programmers in this area are strongly encouraged to seek out the book Unix Network Programming: Volume 1, Networking APIs by W. Richard Stevens (Prentice-Hall, 1998; ISBN 0-13-490012-X).

When C function names appear in the text, the parenthesized number indicate the manual section in which the function is described. This is consistent with Unix standard documentation. For example, read(2) indicates you can use the command
man 2 read
to read the man page
online that describes the function.

Each of the function calls in this section is followed by a brief description, a call template, a short explanation of each parameter, and a list of the information returned by the function.

The call templates in this section look slightly different from the call templates in the Linux man pages. In our templates, each parameter of a function call appears on a separate line. This line break, which provides visual separation, is especially useful for parameters that are defined as structure pointers.

Major Socket Functions

In the Linux operating system, all applications-controlled network-related activities are associated with kernel control blocks. This book refers to each kernel control block as a socket object. In turn, socket objects are controlled by functions. The functions described in this section create socket objects, register them for specific purposes, use them to transfer data, and destroy them. These functions provide the basic building blocks for network-based applications.

Each socket object is created by a process and is referenced in the application by the file-descriptor (FD) number returned by the call to socket(2) or accept(2) that creates that socket object. In Linux, an FD number may be any nonnegative integer. The FD number space shares the spaces used for disk files, pipes, and other input/output objects. Each process has its own set of FD numbers, which may duplicate the set of FD numbers in another process. However, FD numbers in different processes can have different meanings. For example, FD “3” in one process may refer to a disk file, while FD “3” in another process may refer to a socket object.

An FD number can be used with the select(2) and poll(2) functions to allow an application to run as a single process that handles multiple file descriptors. The select(2) function in Linux handles FD numbers ranging from 0 through 1,023. In contrast, the FD numbers used with the poll(2) function are not limited in any way. For network programming, the poll(2) function has the advantage of providing more information about the completion of an I/O transaction than the select(2) function provides. (However, the authors have used the select(2) function successfully in network applications they have written, in which other functions provide the information that would otherwise be provided by the poll(2) function.)

Each socket object has an associated reference count. This count is created, with a value of 1, whenever the socket(2) or accept(2) function creates a new socket object. The reference count is incremented by 1 when the fork(2), dup(2), or dup2(2) function is called, and is decremented by 1 when the close function is called. When the reference count is decremented to 0 (that is, after all references to the socket object have been removed), the socket object is destroyed.

socket

The socket(2) function creates a socket object for a defined communications domain, communications type, and communications protocol. The following code constitutes the template for the call:

#include <sys/types.h> #include <sys/socket.h>

#include <netinet/in.h> /* only need for SOCK_RAW */

int socket (int domain, int type, int protocol)


In this book, the domain parameter is always coded as AF_INET. The type parameter is coded SOCK_ STREAM for TCP connections, SOCK_DGRAM for User Datagram Protocol (UDP) connections, and SOCK_RAW for raw access to IPv4. The protocol parameter is normally set to 0, except when SOCK_RAW is specified, in which instances the protocol parameter is set to IPPORT_xxx, or to values that can be found in the file /etc/protocols.

Note: The type parameter may also be coded as SOCK_PACKET for applications that run in superuser mode. This coding is used in utilities that monitor packet streams for diagnostic purposes; see tcpdump for an example of such a program.

If the function fails, -1 is returned. If the function succeeds, a nonnegative FD number is returned. This FD number is used by every other function call to specify the socket object created by the socket(2) function.

connect

The connect(2) function causes the TCP/IP stack to “open” a connection with a peer system. The action actually performed by the connect(2) function depends on the domain and type of socket object referenced by the call. The following code constitutes the template for the call to this function:

#include <sys/types.h>

#include <sys/socket.h>

int connect (int sockfd, const struct sockaddr *serv_addr, int addrlen);


For SOCK_STREAM sockets, the connect(2) call opens a connection to a peer system at the (remote) IP address and port number specified in the structure pointed to by serv_addr. The resulting connection stays open until the socket object is destroyed (through use of the close function).

For SOCK_DGRAM and SOCK_RAW sockets, this call creates an associative link between the local socket object (on one side) and the (remote) IP address and port number (on the other side). The application can use the write(2), writev(2), or send(2) function (instead of the sendto(2) function) to send a packet. The application can also use the read(2), readv(2), or recv(2) function (instead of the recvfrom(2) function) to obtain data from the remote peer system. In practice, this means that data can be exchanged via a connected socket, with no need to specify a given IP address and port. Unlike SOCK_STREAM sockets, the connect(2) function can be called multiple times, each time with a different address for the peer system. (The peer-system address can be set to the unassigned value by setting the IP address to 0.0.0.0.)

Note: This function is not used with SOCK_PACKET sockets.

If the function fails, –1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

bind

The bind(2) function creates an association between a local Internet address and port number, on the one hand, and the socket object, on the other hand. This function is commonly described as “assigning a name to a socket,” even though the name is actually an address (instead of a name, such as www.foo.bar.com). The following code constitutes the template for the call to this function:

#include <sys/types.h>

#include <sys/socket.h>

int bind(int sockfd, const struct sockaddr *my_addr, int addrlen)


This function takes the IP address and port number contained in the structure pointed to by my_addr and sets the local address for the socket object specified by the parameter sockfd.

In most server systems, the application tells the my_addr structure to specify not only the well-known port, but also a wildcard IP address. Thanks to the wildcard IP address, a request for service (as submitted by a client system) can appear on any interface. However, if servers have multiple IP addresses, the server application may be forced to accept requests for a specific IP address. In such a case, the call to bind(2) dictates both the IP address and the port number. This situation often occurs when a public Web server is located at one IP address and a private Web server is at another address, and both addresses refer to the same physical machine. The wildcard option lets the operating system sort out the datastreams, instead of forcing the application to do this job.

In most client applications, the application tells the my_addr structure to specify not only a 0 port number, but also a wildcard IP address. When a connection is established, the wildcard IP address tells the system to select a port number from the range of ephemeral port numbers . In Linux, ephemeral port numbers are allocated by default from the set ranging from 1,024 through 32,766. A port number cannot be allocated if it is being used by another socket. If the function fails, -1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

listen

The listen(2) function converts an active TCP socket (a socket that will be used with the connect(2) function) to a passive TCP socket (a socket that will be used with the accept(2) function). The following code constitutes the template for the call to this function:

#include <sys/socket.h> int listen(int sockfd, int backlog);


This function allocates memory for an incoming-connection queue that has at least as many elements as are specified in the parameter backlog, and associates that queue with the socket object specified by the parameter sockfd. This function should be used only on sockets of type SOCK_STREAM.

W. R. Stevens claims to have discovered a Linux bug that allows any number of connections (limited only by the amount of memory available) to be made if an application specifies parameter backlog as 0. We will examine this claim during our analysis of the code.

If the function fails, -1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

accept

The accept(2) function examines the queue of completed connections for a specified socket object and creates a new socket object that includes not only the remote peer information, but also any parameters that have been negotiated for the new socket object. The following code constitutes the template for the call to this function:

#include <sys/types.h>

#include <sys/socket.h>

int accept(int sockfd, const struct sockaddr *addr, int *addrlen);


The accept(2) function removes the next available connection from the incoming-connection queue associated with the socket specified by the parameter sockfd. If no connections are pending, then accept(2) call blocks the calling application until a connection is present or until a signal is caught.

When the new socket object is created, the sockaddr block pointed to by the parameter addr is filled in with the connection information, and the length of the information is placed in the integer variable pointed to by the parameter addrlen.

If the function fails, –1 is returned. If the function succeeds, a nonnegative descriptor is returned.



read, readv

The read(2) and readv(2) functions accept data from TCP socket objects, and also from UDP socket objects that are “connected” (via the connect(2) function) to a remote peer system. The difference between the two functions is that the read(2) function specifies a single buffer, while the readv(2) [read vector] function specifies up to 16 buffers into which data should be placed. The following two pieces of code constitute the templates for the calls to these functions:

#include <unistd.h> ssize_t read(int sockfd,

void *buf, size_t count);


#include <sys/uio.h> int readv(int sockfd,

const struct iovec *vector, size_t count);


Both of these functions read data associated with the socket object specified by the parameter sockfd. The read(2) function reads bytes into the buffer pointed to by the parameter buf. The number of bytes read into the buffer cannot exceed the number of bytes specified by the parameter count. For the readv function, the parameter vector points to an array of address/length pairs. The count parameter specifies how many address/length pairs the array contains, and the function reads at most as many bytes into each of the buffers as are specified by the sum of the length members of the address/length pair.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, the total number of bytes read is returned.

If the read(2) function is interrupted by a signal after any amount of data has been read, then the return indicates the number of bytes that were read before the interruption occurred. If the read(2) or readv(2) function is interrupted by a signal before any data has been read, the function returns –1 and sets the global variable errno to the value EINTR. A return of 0 indicates that the end-of-file condition has been reached.

write, writev

The write(2) and writev(2) functions present data to be transmitted by TCP socket objects, as well as data to be transmitted by UDP socket objects that are “connected” (via the connect(2) function) to a remote peer system. The difference between the two functions is that the write(2) function specifies a single buffer whereas the writev(2) [write vector] function specifies up to 16 buffers from which data is to be transmitted. The following pieces of code constitute the templates for the calls to these functions:

#include <unistd.h>

ssize_t write (int sockfd, const void *buf, size_t count);


#include <sys/uio.h>

int writev(sockfd, const struct iovec *vector, size_t count);


Both of these functions write data to buffers associated with the socket object specified by the parameter sockfd. For the write(2) function, the number of bytes specified in the parameter count (which must be non-zero) are written from the buffer pointed to by the parameter buf. For the writev(2) function, the parameter vector points to an array of address/length pairs, and the count parameter specifies how many address/length pairs are in the array. The writev(2) function writes, from each of the buffers, the number of bytes specified by the length member of the address/length pair.

If the function fails, ‒1 is returned and the global variable errno is set accordingly. If the function succeeds, the total number of bytes written is returned.

If the write(2) or writev(2) function is interrupted by a signal after any amount of data has been read, then the return indicates the number of bytes that were read before the interruption occurred. If the write(2) or writev(2) function is interrupted by a signal before any data has been written, the function returns -1 and the global variable errno is set to EINTR.

When an application transmits data via a TCP socket, it is normal for a write(2) (or writev(2)) function to be incomplete when the amount of data being transmitted exceeds the capacity of the SNDBUF window.

shutdown

The shutdown(2) function is used with a TCP socket object to inform the system that the transfer of data in one or both directions has been completed. This functions provides a way to notify the remote peer system of the end of the datastream (“end of file”) to the remote peer, or to warn the remote peer system that data will no longer be accepted on this socket. The following code constitutes the template for the call to this function:

#include <sys/socket.h>

int shutdown(int sockfd, int how);


This function applies only to TCP connections. The socket object described by the parameter sockfd is affected in various ways, depending on the value of the parameter how. If the value of how is 0, then the reception of data is prohibited. If the value of how is 1, then the transmission of data is prohibited, and the remote peer system sees what appears to be an end-of-file condition. If the parameter how is set to 2, then the effect of this value is equivalent to the effect of value 0 and value 1 simultaneously. In other words, if the value of how is 2, then data reception and data transmission are both prohibited, and the remote peer system sees an apparent end-of-file (EOF) condition. (Posix.1g defines three constants that can be used to define the how parameter: SHUT_RD, SHUT_WR, and SHUT_RDWR. However, as of the 2.0.34 release of the Linux kernel, these constants had not been defined for Linux.)

After a shutdown(2) function has been applied to the read half of the connection, all data currently pending in the local system is discarded, along with any data that may subsequently be transmitted from the remote peer system.

A shutdown(2) function applied to the write half of the connection does not impose its will quite so abruptly. Instead, it waits until all the data has been transmitted, and then causes a FIN packet to be transmitted to the remote peer system, to indicate that no more data is forthcoming.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

close [a socket object]

The close(2) function, which is a standard part of the Linux I/O system, is used to dissociate a file-descriptor (FD) number from a socket object. When the last association has been broken, the socket object is destroyed. The following code constitutes the template for the call to this function:

#include <unistd.h>

int close(int sockfd);


The close function releases the file descriptor that specifies a socket object. If the FD is the last remaining one that points to a socket object (that is, when the reference count on the socket object changes from 1 to 0), then the socket object is closed out, the connection is terminated, and all resources are released. For TCP sockets, unless otherwise indicated by the SO_LINGER option, the system makes every effort to send any remaining data to the peer function.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

recv, recvfrom, recvmsg

These three functions, which are specific to certain socket objects, are used by most applications programs to fetch packets of UDP data received by the Linux system. The following code constitutes the templates for the calls to these functions:

#include <sys/types.h>

#include <sys/socket.h>

int recv(int sockfd,void *buf, int len,unsigned flags);

int recvfrom(int sockfd,void *buf, int len,unsigned flags,

struct sockaddr *from, int *fromlen);

int recvmsg(int sockfd, struct msghdr *msg, unsigned flags);


The recv(2) function accepts data from a connected socket. This function is basically identical to the read(2) function, but has the additional parameter flags, which lets the application specify one or more option flags (described a bit later in this section).

The recvfrom(2) function accepts data via unconnected sockets, returning the address information from the remote peer system in the data block pointed to by the from pointer. The length of the data in the block is returned in the integer pointed to by the fromlen parameter. When its parameter from is set to NULL, the recvfrom(2) function is identical to the function recv(2).

The recvmsg(2) function accepts data, using a pointer to a msghdr structure pointed to by the parameter msg, to receive the message, the address, and any return flags.

The flags parameter in each of these three functions can be set by ORing together one or more of the following constants: MSG_DONTWAIT (return immediately, regardless of whether data is present), MSG_OOB (receive out-of-band data), MSG_PEEK (read the data but don’t dequeue it), and MSG_WAITALL (wait for all the data, as specified in the call).

Note: The MSG_WAITALL option is not implemented in the Linux 2.0.34 release.

For the recvmsg(2) function, the TCP/IP code can return the following flags in the msghdr structure: MSG_BCAST (the message was in a broadcast datagram), MSG_MCAST (the message was in a multicast datagram), MSG_TRUNC (not all datagram data was returned), and MSG_CTRUNC (not all ancillary data was returned).

Note: The returned-flags field in the msghdr structure is not used in the Linux 2.0.34 release.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, the total number of bytes read is returned.

If one of these functions is interrupted by a signal after any amount of data has been read, then the return indicates the number of bytes that were read before the interruption occurred. If one of these functions is interrupted by a signal before any data has been read, 1 is returned and the global variable errno is set to EINTR.

send, sendto, sendmsg

These three functions, which are specific to certain socket objects, are used by most applications programs to send packets of UDP data from the Linux system to a remote peer system. The following code fragments constitute the templates for the calls to these functions:

#include <sys/types.h>

#include <sys/socket.h>

int send(int sockfd, const void *msg, int len, unsigned flags);

int sendto(int sockfd, const void *msg, int len,

unsigned flags, struct sockaddr *to, int tolen);

int sendmsg(int sockfd, struct msghdr *msg, unsigned flags);


The send(2) function transmits data via a connected socket. This function is basically identical to the write(2) function, but has the additional parameter flags, which lets the application specify one or more option flags.

The sendto(2) function transmits data via unconnected sockets, taking the address information for the remote peer from the data block pointed to by the to pointer. The length of the data in the block is returned in the integer pointed to by the tolen parameter.

The sendmsg(2) function transmits data, using a pointer to a msghdr structure pointed to by the parameter msg, to specify the message and the remote peer-system address.

The flags parameter in each of these three functions can be set by ORing together one or more of the following constants: MSG_DONTROUTE (bypass routing), MSG_OOB (send out-of-band data), and MSG_DONTWAIT (return immediately).

For the sendmsg(2) function, the flag field in the msghdr is ignored.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, the total number of bytes written is returned.

Socket Option Functions

Socket objects have many, many options that need to be set for particular applications. Some of the options are related to proper socket operation, while others are specific to a particular link and/or protocol. For the purposes of this book, we discuss only the options that apply to socket objects that have been created in the AF_INET domain.

In addition to the functions and options described in this section, applications programs may also use the ioctl and fcntl system calls. The functions and parameters are not addressed here, but the options are discussed in the commentary for each module. Many of the options that are set using ioctl and fcntl can also be set via the setsockopt function, which is described a bit later in this section.

getsockopt

The getsockopt function, which obtains the current option setting for a given socket object, is meant to be used by applications that inherit socket objects. For example, if the inetd process is listening for connections on behalf of a server, and launches the server when a connection is made, the server then needs information about the socket objects that it has inherited. It gets this information via the getsockopt function. The following code constitutes the template for the call to this function:

#include <sys/types.h>

#include <sys.socket.h>

int getsockopt(int sockfd, int level, int optname, void *optval, int *optlen);


The getsockopt function returns information about the current settings of a particular option for a specified socket object. The settings affect how the TCP/IP stack handles various conditions.

The option to be examined is specified using a combination of the parameter level and the parameter optname. In this book, we examine options whose level is specified as SOL_SOCK, IPPROTO_IP, or IPPROTO_TCP. The valid optnames are listed, by level, in the following subsections.

The void pointer to the buffer is specified in the parameter optval. The amount of data written to the buffer by the getsockopt function is reported, on return, by the integer value pointed to by the parameter optlen. The specific information that is returned depends on the option specified in the option optname, as qualified by the value to which the parameter level is set.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

setsockopt

The setsockopt function sets information about a setting of a particular option in the specified socket object. The settings affect how the TCP/IP stack handles various conditions. The following code constitutes the template for the call to this function:

#include <sys/types.h>

#include <sys/socket.h>

int setsockopt(int sockfd, int level, int optname, void *optval, int optlen);


The void pointer to the buffer is specified in the parameter optval, and the length of the data in the buffer is specified in the parameter optlen. The specific information passed to the function depends on the option specified in the option optname, as qualified by the value to which the parameter level is set.

The option to be set is specified by a combination of the parameter level and the parameter optname. In this book, we examine options whose level is specified as SOL_SOCK, IPPROTO_IP, or IPPROTO_TCP.

The valid optnames are listed, by level, in the following subsections.

If the function fails, 1 is returned and the global variable errno is set accordingly. If the function succeeds, 0 is returned.

Socket Options (SOL_SOCKET)

These options are used with the getsockopt and setsockopt functions previously described. In these calls, the parameter level should be coded as SOL_SOCKET. This section describes the format of the data that is being returned (via getsockopt) or that is being passed (via setsockopt).

We include options that do not appear in the Linux documentation but that are supported by the standard implementation of the Linux kernel code. When an option is available that uses another interface, that fact is mentioned.

We also include options that are defined in other TCP/IP implementations but that have not been implemented in the Linux 2.0.34 release. Based on our review of the commentary, we believe the list of unimplemented options described in this section provides strong hints as to when — and perhaps where — these options will be added to the Linux implementation.

SO_BINDTODEVICE

This option uses an instance of the structure ifreq. The setsockopt function reads (from the structure buffer) the null-terminated interface name from which all accesses should be serviced for the socket in question. The setsockopt function then saves this name in the socket object.

If an interface name has been specified, the getsockopt function returns (into the structure buffer) the null-terminated interface name that should be used with the socket object. Otherwise, the getsockopt function returns a zero-length string.

The default value is 0 (in other words, no binding to an interface).

Note: The man pages do not document the SO_BINDTODEVICE option.

SO_BROADCAST

This option uses a single integer value. A 0 value blocks transmission of a broadcast packet using this socket object. Conversely, a non-zero value enables transmission of a broadcast packet.

The default value is 0 (in other words, no broadcast packets are allowed).

SO_BSDCOMPAT

This option uses a single integer value to indicate whether API responses are compatible with the Berkeley Software Distribution (BSD) implementation of Unix. A 0 value disables compatibility mode, whereas a non-zero value changes the way certain conditions are reported to the application.

Unless this option is enabled, Linux returns Internet Control Message Protocol (ICMP) Destination Unreachable messages to unconnected UDP sockets.

The default value is 0 (that is, BSD compatibility mode is disabled).

Note: The man pages do not document the SO_BSDCOMPAT option.

SO_DEBUG

This option, which applies only to TCP sockets, uses a single integer value to indicate whether debugging information is captured. A 0 value disables the capture of debugging information, while a non-zero value enables this feature. When the debug feature is enabled, information about the TCP packets that have been sent or received is kept in a circular buffer located in the kernel. This buffer can be read by a utility.

The default value is 0 (that is, no debugging information is captured).

SO_DONTROUTE

This option uses a single integer value to change the routing followed by an outgoing packet. A 0 value disables the use of nonstandard routing, while a non-zero value enables this feature. When the nonstandard routing feature is enabled, a packet being transmitted bypasses the normal routing protocol and (if possible) is sent to the appropriate local interface.

The default value is 0 (that is, all outgoing packets follow the standard routing).

SO_ERROR

This option, which uses a single integer value, is a readonly and read-once parameter. It returns the current value of the socket-object error member and is then set to 0. A return of 0 means that no function call has returned an error code since the last time this parameter was read, whereas a non-zero value indicates that an error has occurred, and the value of the integer corresponds to the error code.

SO_KEEPALIVE

This option, which uses a single integer value, applies only to TCP sockets. A 0 value disables the keep-alive feature, while a non-zero value enables it. When enabled, TCP exchanges a message with the remote peer system at regular intervals (by default, once every two hours). If the remote peer system does not respond within a given period (usually 12 minutes), the SO_KEEPALIVE option tells the application that the connection has been broken.

The default value is 0 (that is, no keep-alive signal is sent via the socket).

SO_LINGER

This option uses the structure linger, which can be found in /usr/include/linux/socket.h. The structure consists of two members: l_onoff and l_linger.

When the value of l_onoff is 0, the socket behaves normally when the close function is called. When the value of l_onoff is non-zero, the socket uses an alternative action.

When the alternative action is enabled, a non-zero value of the l_linger member indicates how long the close function should wait for all data to be transmitted to the remote peer system before the close function closes the socket. A 0 value of the l_linger member causes the socket to be closed immediately after the close function has been called.

Note: According to Posix, the l_linger member of the linger structure must be interpreted such that the l_linger tick is counted in seconds. However, the man page for setsockopt and the Linux code both indicate that the l_linger tick is counted in hundredths of a second.

The default value is 0 (that is, a socket should be closed immediately after the close function is called).

SO_NO_CHECK

This option uses a single integer value to determine whether checksums are calculated. A 0 value disables this feature, thereby causing checksums to be calculated, while a non-zero value enables this feature, instructing the underlying protocol module not to calculate checksums.

The default value is 0 (that is, checksums are calculated, if appropriate for the protocol).

SO_OOBINLINE

This option, which uses a single integer value, is disabled by a 0 value and enabled by a non-zero value. When the option is enabled, out-of-band data is placed in line with normal data, and use of the recv, recvfrom, or recvmsg functions with MSG_OOB in the flags parameter is prohibited.

The default value is 0 (indicating that out-of-band data is segregated).

SO_PRIORITY

This option, which uses a single integer value, sets the priority of a transmission. The three valid values are SOPRI_BACKGROUND, SOPRI_NORMAL, and SOPRI_INTERACTIVE.

Note: The SO_PRIORITY option is not documented in the man pages. The priority field in the socket object is normally set using the IP_TOS (Type Of Service specification) option (see the “IPPROTO_IP” section, later in this section).

SO_RCVBUF

This option uses a single integer value, which can range from a low of 256 bytes to a high of either 65,535 bytes or twice the value of SK_RMEM_MAX (usually 32,767), whichever is lower. This integer value defines the size, in bytes, of the receive buffer for the socket.

The default value is the value of SK_RMEM_MAX, which can be configured when the kernel is compiled.

SO_REUSEADDR

This option, which uses a single integer value, is disabled by a 0 value and enabled by a non-zero value. When this option is enabled, the server application can reuse a given port even when connections specifying that port already exist, and even when other server applications exist that use the same port (provided, of course, that these applications have different IP addresses).

The default value is 0 (that is, addresses are not reused).



SO_RCVLOWAT

This option, which is not implemented in the Linux 2.0.34 release, represents the low-water mark or lower threshold for received data. In other words, when the amount of data indicated by this option is available, the select(2) and poll(2) functions will indicate data ready.

If the amount of available data is less than the figure indicated by the SO_RCVLOWAT option, the select(2) and poll(2) functions will not indicate data ready. The default value is 1; in other words, if a single byte of data is available, then the select(2) and poll(2) functions will indicate data ready.

SO_RCVTIMEO

This option is not implemented in the Linux 2.0.34 release.

SO_REUSEPORT

This option, which is not implemented in the Linux 2.0.34 release, was introduced into Berkeley implementations for multicast support. SO_REUSEADDR is overloaded with the ability to bind multiple socket objects to the same port for multicast applications.

SO_SNDBUF

This option uses a single integer value, which can range from a low of 256 bytes to a high of either 65,535 bytes or twice the value of SK_RMEM_MAX (usually 32,767), whichever is lower. This value defines the size of the transmit buffer for the socket.

The default value is SK_RMEM_MAX, which can be configured when the kernel is compiled.

SO_SNDLOWAT

This option, which is not implemented in the Linux 2.0.34 release, represents the low-water mark or lower threshold for transmitted data. In other words, when the amount of buffer space indicated by this option is available, the select(2) and poll(2) functions will indicate okay-to-write.

If the amount of available buffer space is less than the figure indicated by the SO_SNDLOWAT option, the select(2) and poll(2) functions will not indicate okay-to-write. The standard default value is 2048; in other words, if 2,048 bytes of buffer space are available, then the select(2) and poll(2) functions will indicate okay-to-write.

SO_SNDTIMEO

This option is not implemented in the Linux 2.0.34 release.

SO_TYPE

This option uses a single integer value. This value, which is accessible via the getsockopt function, is a read-only value that returns the socket type. The valid values are those defined by SOCK_xxx.



IP Standard Options (IPPROTO_IP)

These options are used with the getsockopt(2) and setsockopt(2) functions previously described. In these calls, the parameter level should be coded as IPPROTO_IP.

We have included options that do not appear in the Linux documentation but that are supported by the standard implementation of the Linux kernel code. When an option is available that uses another interface, that fact is mentioned.

The distribution does not document the IPPROTO_IP socket options on any man page. Other distributions may include documentation for these options.

IP_HDRINCL

This option, which uses a single integer value, applies only to sockets of type SOCK_RAW. The option is disabled by a 0 value and enabled by a non-zero value. When IP_HDRINCL is enabled, the application builds and provides the complete IP header for outgoing packets.

The default value is 0 (that is, the system builds the IP header).

Note: In the commentary, we discuss the various sources of the information that appears in IP headers.

IP_OPTIONS

This option uses a buffer containing up to eleven 32-bit integers (44 bytes). The buffer contains the contents of the IPv4 options field in the packet. (This information is used for source routing, timestamping, route recording, and other optional IP facilities.) The setsockopt(2) function records the buffer contents in the socket object, so that the buffered information can be placed in every packet that is transmitted.

For TCP sockets, the getsockopt(2) function returns the source route that accompanied the SYN packet. For other sockets, the getsockopt(2) function returns the same information that was stored by a prior setsockopt(2) function on the socket object in question.

In the default setting, the options-field buffer is empty.

IP_RECVDSTADDR

This option is not implemented in the Linux 2.0.34 release. For UDP packets, this option lets an application recover the destination address, in the local system, of a packet that has been received. This option serves as an alternative to the binding of multiple instances of a UDP server to specific IP addresses for a multi-address host.

IP_RECVIF

This option is not implemented in the Linux 2.0.34 release. For UDP packets, this option returns the identification of the interface from which a packet was received.

IP_TOS

This option uses a single integer value, which can be one of the following constant values:

The integer value is inserted in the Type Of Service (TOS) field of transmitted IP packets.

The default value is 0 (in other words, the TOS is the normal one).

Note: The IPTOS_RELIABILITY constant is not used in the kernel as of the Linux 2.0.34 release.

IP_TTL

This option uses a single integer value. The time-to-live (TTL) value (which may be from 1 to 255, inclusive) represents the number of “hops” that a packet can make before the routers discard it.

The default values are 64 for TCP sockets, 64 for UDP sockets, and 0 for raw sockets (that is, sockets created via the SOCK_RAW option).



IP Multicast Options (IPPROTO_IP)

These options are used with the getsockopt(2) and setsockopt(2) functions described in the “Socket Option Functions” section, earlier in this section. In these calls, the parameter level should be coded as IPPROTO_IP.

These functions are described here in summary form.

Note: Linux includes support for these functions only if the kernel configuration is set to enable multicasting.

IP_ADD_MEMBERSHIP

This option passes the structure ip_mreq, to specify the multicast group that an application should join, or to inquire about the multicast group to which a given socket is joined. The members of the structure specify the multicast address and the local IP address.

IP_DROP_MEMBERSHIP

This option passes the structure ip_mreq, to specify the multicast group that an application should drop. The members of the structure specify the multicast address and the local IP address.

IP_MULTICAST_IF

This option passes the structure in_addr, to specify the interface that an application should use for outgoing packets.

IP_MULTICAST_LOOP

This option enables or disables local loopback of multicast messages. The default setting, enable, allows messages to be looped back.

IP_MULTICAST_TTL

This option specifies the hop count for multicast messages sent via this socket. The default value is 1 hop.



TCP Options (IPPROTO_TCP)

These options are used with the getsockopt(2) and setsockopt(2) functions previously described. In these calls, the parameter level should be coded as IPPROTO_TCP.

We have included options that do not appear in the Slackware Linux documentation but that are supported by the standard implementation of the Linux kernel code. When an option is available that uses another interface, that fact is mentioned.

The Slackware distribution does not document the IPPROTO_TCP socket options in any man page. Other distributions may include documentation for these options.

TCP_KEEPALIVE

This option is not implemented in the Linux 2.0.34 release. The SO_KEEPALIVE option (under SOL_SOCKET) is a useful alternative.

TCP_MAXRT

This option is not implemented in the Linux 2.0.34 release. This new parameter, which is defined in Posix.1g, indicates (in seconds) how long re-transmission should be attempted before the socket is declared dead. A 0 value indicates that the system default will be used, and the value –1 indicates that retries should be performed indefinitely.

TCP_MAXSEG

This option uses an integer buffer. The value is an integer, from 1 to MAX_WINDOW (32,767), that specifies the size of the largest window that can be published.

The default size depends on the segment size published by the remote peer system, and on the size that was defined when the kernel was compiled.

TCP_NODELAY

This option uses an integer buffer. A 0 value disables this feature, and a non-zero value enables it. When the feature is enabled, the algorithm used to reduce the number of small packets on the WAN is bypassed.

The default value is 0 (that is, the algorithm is enabled).

TCP_STDURG

This option is not implemented in the Linux 2.0.34 release.



Back to Table of Contents


Comments, suggestions, and error reports are welcome.
Send them to: ipstacks (at) satchell (dot) net
Copyright © 2022 Stephen Satchell, Reno NV USA