After applying updates in CMS/SMS SP2 computers suddenly stopped receiving packages and other information from the NS. An examination of the agent logs showed that the computers were not even receiving replies to agent configuration update requests.
The HTTP logs were heavily populated with 503 errors whenever GetClientPolicies.aspx was being called.
The agent logs of the outlying computers contained some or all of the errors and warning similar to the following:
Process: aexnsagent.exe (8408)
Thread ID: 3204
Description: RequestPolicies failed: HTTP error: (-2147209951)Other symptoms of the problem included extreme slow down and poor response time of the Console, and very slow processing of NSE files in the event queue folders.
1. The largest factor apears to be a Stored Procedure (spPMCore_GetResourceAssociation) for Patch Management Solution, it contains some costly SQL code in it. As a result, when this stored procedure was executed (needing to be executed against each enabled bulletin for each computer checking in), it was taking 2-4 seconds to complete for each lookup. This would subsequently cause the entire Client Configuration Update request to take 5-10 minutes to complete; and by the time the query was completed the connection in the DefaultAppPool had been dropped, and eventually registered it in the HTTP logs as a 503 error.
2. All or most of the policies for Patch Management bulletins are enabled, going back to MS01... even though they are not needed. And so, when building the ClientPolicy.xml for the requesting computer, the NS must include associations to every enabled policy that applies to the computer (which is handled by the spPMCore_GetResourceAssociations mentioned in #1). This would include policies for bulletins the computer already has installed (because the computer still remains in the intersect collections).
As a result it takes the NS so long to generate the client policy for the requesting computer (sometimes 10 minutes plus) that the connection in the DefaultAppPool is dropped. The severed connection is what seems to generate the "malformed HTTP response" error in the logs. This behavior is compounded by many computers requesting policies at the same time. Eventually all maximum number of configuration requests will be reached causing all computers to stand in line hammering the NS with requests that will never be answered. It therefore also slows the NS, overwhelms its resources, and can cause high CPU utilization on the database server.
Note: This problem should be resolved after applying Symantec Management Platform SP5 and Patch Management Solution SP2 MR2. If this is not possible then perform the following steps.
1. The Patch portion of the issue has been resolved in MR1. There are additional fixes that will be implemented in SP5 for the Symantec Management Platform.
2. If policies for non essential software updates are still enabled then disable as many as possible. A good practice would be to (1) Run the compliance reports and determine what is the oldest vulnerability out there (2) Disable all other policies for updates previous to the oldest known vulnerability, (3) If a common image is being used to deploy new computers make sure it is kept up to date with the latest OS Service Pack, if not the actual updates.