Windows: Network connections timing out too quickly on temporary connectivity loss

If you have a rather unstable network where you tend to loose connectivity for a short time frequently, you might notice that established connections (e.g. ssh connections using putty) will get lost. You can then immediately reconnect but it’s still a pain.

The issue is not really with the software loosing the connection (e.g. putty) but rather with the Windows network configuration. A single application cannot set the network settings for the whole application or a specific session to prevent this problem. To solve this problem, you will need to tweak a few Windows network settings.

Basically tweaking these settings means increasing the TCP timeout in Windows. This can be done in the registry.

The relevant TCP/IP settings are:

  • KeepAliveTime
  • KeepAliveInterval
  • TcpMaxDataRetransmissions

These parameters are all located at the following registry location: \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Tcpip\Parameters.

On Windows versions which are not based on Windows NT (i.e. Windows 95, Windows 98 and Windows ME), these parameters are located under: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\VxD\MSTCP.

KeepAliveTime

The KeepAliveTime parameters controls how long the TCP driver waits until the a keep-alive packet is sent over an idle TCP connection. A TCP keep-alive packet is simply an ACK packet sent over the connection with the sequence number set to one less than the current sequence number for the connection. When the other end receives this packet, it will send an ACK as a response with the current sequence number. These communication is used to make sure that the remote host at the other end of the connection is still available and make sure the connection is kept open.

Since TCP keep-alives are disabled by default, the application opening the connection needs to specifically enable them.

The value is the number of milliseconds of inactivity before a keep-alive packet is sent. The default is 7,200,000 milliseconds (ms) i.e. 2 hours.

Note that the default of 2 hours might be to high in some cases. Having a high KeepAliveTime brings two problems:

  1. it may cause a delay before the machine at one end of the connection detects that the remote machine is no longer available
  2. many firewalls drop the session if no traffic occurs for a given amount of time

In the first case, if your application can handle reconnect scenario, it will take a very long time until it notices the connection is dead and it would have been able to handle it properly if it failed fast.

In the second case, it’s the opposite, the connection is articially closed by the firewall inbetween.

If you encounter one of these cases on a regular basis, you should consider reducing the KeepAliveTime from 2 hours to 10 or 15 minutes (i.e. 600,000 or 900,000 milliseconds).

But also keep in mind that lowering the value for the KeepAliveTime:

  • increases network activity on idle connections
  • can cause active working connections to terminate because of latency issues.

Setting it to less than 10 seconds, is not a good idea except if you have a network environment with with a very low latency.

KeepAliveInterval

If the remote host at the other end of the connection does not respond to the keep-alive packet, it is repeated. This is where the KeepAliveInterval is used. This parameter determines how often this retry mechanism will be triggered. This is basically the wait time before another keep-alive packet is sent. If at some point in time the remote hosts responds to the keep-alive packet, the next keep-alive packet will be again sent based on the KeepAliveTime parameter (assuming the connection is still idle).

The value is the number of milliseconds before a keep-alive packet is resent. The default is 1,000 milliseconds (ms) i.e. 1 second. If the network connectivity losses sometimes last a few minutes, it’d make sense increasing this parameter to 60,000 milliseconds i.e. 1 minute.

TcpMaxDataRetransmissions

Of course this retry process cannot go on for ever. If the connection is not only temporarily lost but lost for good, then the connection needs to be closed. This is where the parameter TcpMaxDataRetransmissions is used. This parameter defines the number of keep-alive retries to be performed before the connection is aborted.

The default value is to perform 5 TCP keep-alive retransmits. If you experience network instability and lose connections too often, you should consider increasing this value to 10 or 15.

Note that starting with Windows Vista, this parameter doesn’t exist anymore and is replaced by a hard-coded value of 10. After 10 unanswered retransmissions, the connection will be aborted. But you can still control the time frame which a connection could survive a temporary connectivity loss by adapting the KeepAliveInterval parameter.

Also note that this parameter only exists in Windows NT based versions of Windows. On old systems running Windows 95, Windows 98 or Windows ME, the corresponding parameter is HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\VxD\MSTCP\MaxDataRetries.

Summary

Tweaking the parameters above, one can configure the Windows TCP driver so that connections can survive small connectivity losses. Remember that after changing these settings, you’ll need to reboot the machine (it’s Windows after all…).

If you cannot modify TcpMaxDataRetransmissions because you have a newer version of Windows, you can still reach the same results by increasing KeepAliveInterval instead.

Also note that issues with lost connections in unstable networks seems to especially affect Windows Vista and later. So if you move from Windows XP to let’s say Windows 7 and you experience such issues, you should first add the KeepAliveTime  and KeepAliveInterval parameters to the registry, reboot, check whether it’s better and possibly increase the value of KeepAliveInterval if required.

All parameters above should be stored in the registry as DWORD (32bit value).

One thought on “Windows: Network connections timing out too quickly on temporary connectivity loss

  1. Nice article. Stumbled when searching on the subject and this seems to wrap concepts I’ve seen spread through different pages in other resources. 😉

    Nits:
    1. “environment with with a very” (repeated word)
    2. “add the KeepAliveTime and” (extraneous space)

Leave a Reply

Your email address will not be published. Required fields are marked *