High CPU usage in the TCPIP process group on Edge SWG
search cancel

High CPU usage in the TCPIP process group on Edge SWG

book

Article ID: 165610

calendar_today

Updated On:

Products

ProxySG Software - SGOS

Issue/Introduction

The CPU Monitor is reporting high CPU utilization in the TCPIP process group on Edge SWG (formerly ProxySG).

Resolution

TIME_WAIT entries are very high

When a connection closes, SG switches that connection to a TIME_WAIT state for 120 seconds.  A connection in this state consumes a small amount of CPU cycles. If multiple TIME_WAIT connections begin to accumulate, CPU utilization climbs, processing time increases, and TIME_WAIT entries begin to queue until processed. The relationship between outstanding TIME_WAIT entries and CPU utilization depends on the ProxySG platform.  As a general rule, if the number of TIME_WAIT entries exceed 10,000 and CPU utilization for TCPIP is high, the number of TIME_WAIT entries could be the cause.  View the number of TIME_WAIT connections from the web management interface: https://x.x.x.x:8082/TCP/Statistics

If both the entries in TCP time wait queue and the CPU/TCPIP utilization are very high, decrease the number TIME_WAIT entries by lowering the 2MSL timeout (default is 120 seconds). To do this, access the ProxySG through SSH or a serial console and enter the following commands:

ProxySG> enable
ProxySG# configure terminal
ProxySG# (config) tcp-ip tcp-2msl <seconds>

Lower the 2MSL value to 30 seconds, then monitor TIME_WAIT at: https://x.x.x.x:8082/TCP/Statistics.  If the number of entries remains high, gradually decrease the 2MSL value.  Do not set the value lower than 10 seconds.
Important: The 2MSL value should never be lower than other devices the ProxySG interacts with, such as a firewall.

ProxySG is processing excessive bandwidth

Each ProxySG platform is sized for a specific amount of bandwidth. High CPU utilization in HTTP can be cause by a routing loop or too many incoming requests. Verify bandwidth in the management interface using Statistics >Traffic Details. If high bandwidth corresponds with high CPU utilization, lower the ProxySG processing bandwidth and continue to monitor CPU utilization.  Consider that bypassed traffic still goes through the Proxy.

ACK Storm

An ACK storm occurs when sequence numbers in a TCP connection are not the expected sequence numbers by the connection participants. This is often a result of an unsuccessful session hijack attempt where an attacker incorrectly guesses the sequence number, or allows the original session participant to continue the transaction but changes sequence numbers in the hijack flow. The resulting behavior is high TCP bandwidth utilization with few genuine HTTP/FTP proxy connections. Additionally, CPU utilization may randomly spike to 100% during this storm while the TCP component consumed the CPU processes.  A PCAP (packet capture) from the ProxySG will reveal an onslaught of unsolicited ACK packet events during an ACK storm. Fortunately, the source of the attack can be identified, blocked, and removed using attack-detection on the ProxySG, or network ACLs.

Additionally, the following CLI command can be used to enable duplicate ACK detection by attempting to reset the connection to prevent an ACK storm if an intermediate device continues to send duplicate ACKs. The default threshold value before resetting the connection is 100:

#(config)tcp-ip tcp-bad-dupack-detect disable
ok
#(config)tcp-ip tcp-bad-dupack-detect enable
ok
#(config)tcp-ip tcp-bad-dupack-detect threshold ?
<bad duplicate ack threshold>
#(config)tcp-ip tcp-bad-dupack-detect threshold 5
ok

To view the threshold:

#show tcp-bad-dupack-detect
TCP bad duplicate ack detection: enabled
TCP bad duplicate ack threshold: 5

If the number of duplicate ACKs reaches the set the threshold, an event log is written to signal the event. Example:
"Number of bad duplicate acks exceeded threshold, connection x.x.x.x:4166 to x.x.x.x:80 is reset"  0 3020C:1   ../main/event_logger.cpp:36

Excessive Client Requests

High TCPIP CPU utilization can result when one or more clients are sending high number of requests to the proxy, resulting in a loop of request/authentication prompts. For example, a client opens multiple and simultaneous sessions to the same destination or applications that do not handle proxy authentication. This behavior can be identified by opening the management console and navigating to Statisticsâ–șSessionsâ–șActive Sessions. A single user can be seen trying to initiate a large number of connections to the same destination. The log file may also contain several NULL character found in the request line from entries.

Treat the offending client as being infected with viruses, malware, unstable applications, 3rd-party add-ons that make repeated calls to suspicious sites, etc.

The Attack Detention feature limits the number of sessions permitted from a single client to a max of 100 simultaneous and subsequent sessions for a client IP. Care should be used when implementing this policy as it may prevent legitimate applications from connecting correctly to the internet.

It is not recommended to enable this feature when:

  • The proxy is an upstream proxy for other proxies.
  • A group of clients are connecting to the proxy using an identical IP address from a NAT device.

To enable Attack Detection, issuing the following commands at the CLI:

proxy#conf t
Enter configuration commands, one per line. End with CTRL-Z.
proxy#(config)attack-detection
proxy#(config attack-detection)client
proxy#(config client)enable-limits

Authentication Loops

This occurs when clients send large amounts of retransmits due to authentication issues, which could result in high TCPIP CPU utilization and unknown errors written to the event log.

The following policy enables detailed logging to help isolate the application/site causing the issue: 

<Exception> exception.id=internal_error action.log_internal_error(yes)
exception.id=authentication_failed action.log_internal_error(yes)
define action log_internal_error
   log_message("you are using $(request.header.User-Agent) from $(client.address) and as user $(user.name) in realm $(realm) going to $(url)")
end action log_internal_error

Additional Information

Enabling TCP bad duplicate ack or Attack detection features needs to be understood and should be used sparingly.

Root cause of constant ACK Storm or excessive client requests  needs to be resolved  / investigated using packet captures taken from different points in the network.