Tue, 17 Feb 2009
SO_REUSEADDR, SO_LINGER and Microsoft.
A simple close() on a TCP connection sometimes isn't quite that simple. A TCP close is usually done using a three-way handshake, very similar to the connection setup. In some cases the shutdown is done in four steps which is known as half-close: one side of the connection is done talking but the other is not. The simplified three step flow:
A B ------ FIN ----> <-- FIN + ACK -- ------ ACK ---->
Afterwards 'A' is required to keep the socket in TIME_WAIT as the final acknowledgement could be lost, which would lead 'B' to resend FIN+ACK which should be acknowledged again. This state must be maintained for 2 * MSL (twice the maximum time an IP packet can exist on the network). Most implementations use anMSL of 30 seconds up to 2 minutes, resulting in a TIME_WAIT state lasting two to four minutes.
Developers need to care about this due to another implementation detail:
Most network stacks don't allow a port in TIME_WAIT to be reused. This isn't a problem
for clients as they tend to use random ephemeral ports. The server however might need to be restarted while
a closed connection is still in TIME_WAIT
There are several workarounds:
- client closes connection
- SO_LINGER
- SO_REUSEADDR
The first solution is to have the client close the connection. Only the side which closes the connection (i.e. sends out the first FIN has to deal with the TIME_WAIT. A simple solution but usually not sufficient: even if the protocol doesn't require the server to close the connection it may still want to if a client misbehaves.
A second possible solution is to use SO_LINGER. A quote from man 3 socket:
SO_LINGER Lingers on a close() if data is present. This option controls the action taken when unsent messages queue on a socket and close() is performed. If SO_LINGER is set, the system shall block the process during close() until it can transmit the data or until the time expires. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option takes a linger structure, as defined in the <sys/socket.h> header, to specify the state of the option and linger interval.
The last sentence is the interesting bit. SO_LINGER allows you to reduce the TIME_WAIT interval. The downside is that any further packets (i.e. FIN+ACK) will trigger a RST response instead of the usual ACK.
The recommended solution is to set SO_REUSEADDR.
SO_REUSEADDR Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is sup‐ ported by the protocol. This option takes an int value. This is a Boolean option.
This will simply allow the port to be used again. The only limitation is that any sockets (the combination of source and destination IP and port) which still are in TIME_WAIT can't be reused until they leave the state. This shouldn't pose any problems as clients will reconnect with different source ports.
Interestingly SO_REUSEADDR has caused a few bugs in different operating systems.
The first is quite old and affected a number of systems including Linux. If a process opened a port with SO_REUSEADDR and bound to INADDR_ANY another process could bind to the same port on a specific interface. This allowed the second process to steal traffic destined for the original process. This has been fixed a while ago.
The second is documented by Microsoft.
Summarized: it allows a Windows program to intercept traffic meant for a
different application by opening the same port with SO_REUSEADDR set. Note that the first application
doesn't need to have it set for this to work. Also note the sentence 'No special privileges are required to use this option.'
The recommended way to avoid this problem appears to be to
use the SO_EXCLUSIVEADDRUSE option (available from Windows 2000 onwards). Unfortunately this requires
the user to be a member of the 'Administrators' security group on Windows 2000 and XP.
In other words, Microsoft is counting on application developers to fix an operating system bug and even then it took
them two releases to get the workaround usable.
posted at: 22:27 | path: / | [ 0 comments ]