Troubleshooting: PGP Server Clustering (Symantec Encryption Management Server)
search cancel

Troubleshooting: PGP Server Clustering (Symantec Encryption Management Server)

book

Article ID: 153412

calendar_today

Updated On:

Products

Desktop Email Encryption Drive Encryption Encryption Management Server Endpoint Encryption File Share Encryption Gateway Email Encryption PGP Command Line PGP Key Management Server PGP Key Mgmt Client Access and CLI API PGP SDK

Issue/Introduction

When using Symantec Encryption Management Server (PGP Server) in a cluster, the following issues could be signs of a cluster related issue:
 

  • PGP Desktop clients are unable to enroll on a given cluster member even with confirmed network connectivity.
  • PGP Desktop clients are unable to check policy on a given cluster member.
  • Symantec Encryption Management Server utilizing High Availability Mode with Web Messenger do not replicate messages.
           For example: Web Email Protection email messages found on one server are not found on other servers in the cluster.
  • A user is not available on one member of the cluster.
  • A users WDRT is not available on one member of the cluster.
  • The number of users on one server does not match the number of users on another server.
           If this discrepancy stays constant for more than a few days or increases, this may be due to a replication issue in the cluster.

 

To check if replication is working, try adding an administrator to each of the PGP server cluster nodes and see if they replicate to each of the other nodes.  This is typically the easiest indicator that replication is working.

 

Resolution




Section 1 of 4: General Troubleshooting

  1. Check the client logs on the PGP Server to se if there are any cluster errors.   Some of the errors are due to data being replicated before prerequisite data is replicated.  For example, some replication data is replicated in groups and before it can be written, a prerequisite group of data must be written.  There may be some time involved to get all these bits and pieces replicated over.  For example, consider two bits of data, "RepPartA" and "RepPartB".  Once both of these bits of data are written to the database, both will make a complete bit of data.  If a cluster member has RepPartB, but not RepPartA, then there may be some errors indicating it can't write the data to the server.  To resolve this, there are replication "scans" that take place nightly that will pick up missing pieces.  Each scan will go through each table and check to see if bits of data are missing and will replicate to the other nodes as needed.  These scans take time.  If you are not having any luck with the scans, reach out to Symantec Encryption Support to ensure the scans are completing and can further troubleshoot.
     
  2. Investigate the Cluster logs on the Symantec Encryption Management Server cluster members. If there aren't any log messages passing between server cluster members in the logs, this could manifest a cluster issue.

     
  3. Compare the user count on cluster server members. If the user count is different on the cluster members and stays constant or increases, this is probably due to a synchronization issue.  This is not always the case, but something to look at--there can be some small discrepancies, but larger discrepancies should be reviewed. 

  4. Sometimes it is useful to run some of the read-only commands on the server to test network connectivity.  Telnet via port 444 as well as traceroute 
    Note: Traceroute is not included in 10.5.0 MP1 and MP2, but was added back in MP3.


  5. Although this is more of an advanced thing to check, MTU size for network devices is a critical item to validate.  If the PGP server is configured for the default value of 1500 MTU, but the network does not allow 1500, there can be some oddities that may make it look like the replication service is not working, but really is.  For good practice, you could try changing the MTU size to 1396, but work with the network team who is informed on which value is best to use.

  6. Make sure port 444 is open between all cluster nodes and in both directions. 

  7. Make sure the PGP server has a valid TLS certificate assigned to the interface and that it is being accepted by all nodes

  8. Make sure the time is correctly configured for all nodes.  Drift of 3 minutes can cause replication issues to occur.

  9. Make sure each node is on the same version and build.  Different versions of the PGP server will not replicate to each other, and the logs will typically indicate this. 

  10. If none of the above help, it's good to consult the following article:

    154069 - Best Practices: Environmental Requirements for Symantec Encryption Management Server clustering (AKA PGP Server)

    Once you have gone through the items in the KB above this should help ensure the best chances for replication to work.  


NTP or VMware time sync (not both)
The PGP Encryption Server needs to have the same time on each of the cluster servers.  If you have configured "Local time", it's possible there could be clock skew, such as the following cluster log entry:

 

Using an NTP server is a good idea to ensure the PGP Server are all going to have the same time.

Note: If you are running in VMWare you may not be able to use both NTP and VMWare Time Sync.
Either deactivate time sync or do not use NTP.  See the following article for information on time as it relates to Symantec Encryption Management Server:



Section 2 of 4: Important Note on Breaking Cluster Members

If you still cannot get clustering to work, do not attempt to break the cluster.  Doing so will require rejoining the cluster, and if the database is large, this is a time-consuming event.  Typically support can fix a clustering issue without having to break the cluster.  Breaking the cluster typically introduces other complexities into the equation so it's just better to work with support on this.   You may need to configure putty access to the server to attempt additional troubleshooting steps.  See the following article for more information on setting up SSH access to the PGP server:

153592 - Access Encryption Management Server by using SSH

 

 

Section 3 of 4: Enabling Web Email Protection on cluster nodes that did not have it previously enabled

There are conditions where adding a PGP server cluster node to the existing cluster is needed for redundancy.  When this is done with Web Email Protection, the replication can be handled in a few different ways.

Depending on the service, WEP content may not be replicated to all nodes.  If the WEP service is disabled on a cluster node, users may appear on all cluster nodes but their Web Email Protection content would not be enabled.  If this is the case, cluster nodes will not attempt to send WEP data to the other nodes and this creates multiple rings where there could be some nodes that have multiple upstream or downstream servers.  This is all expected.  For best results and for best redundancy, the WEP service should be enabled on all cluster nodes, and then the "All" setting should be set to each node to ensure all WEP data is replicated in the ring:

 

As you can see in the screenshot above, the WEP service is enabled, but Message Replication is "Off". 

This means that WEP can be used, but messages that originate on this server will stay on this server and will not replicate to the other nodes.

The best scenario for redundancy is to be able to have all replication enabled for All nodes, and then have the service enabled on all PGP servers.
This will ensure messages are always on all the nodes, so if one server is unavailable, the messages are always on the other nodes.
This means that each server should be appropriately resourced so that they have the same amount of drive space, memory, etc.

Once Message Replication is set to "All", then this will replicate messages to all nodes and each server will have roughly the same hard drive space used on the cluster nodes.

 

Something to keep in mind when you enable WEP on all nodes and set Message Replication to All, depending on the amount of data, this could take some time to replicate over.
For example, if WEP is using 50GBs of space, it could take 8 hours to replicate all the data to the nodes.  When this replication is happening, other replication tasks may take time, so you will want to plan accordingly.
Assume that while this is happening, that node will not be fully functional until it completes.

Once you have enabled the WEP service and clustering to "All", it is a good idea to restart services from the System\General tab of the PGP server and this should trigger replication to re-calculate and start processing all the replication data.

Section 4 of 4: Useful Replication Commands:

Some useful commands may come into play during your troubleshooting:


To verify connectivity issues within cluster members:

pgprepctl topo

 

To verify the incoming and outgoing connections of a node:

pgprepctl debug list

 

Used to monitor the queues and make sure replication is working:

pgprepctl info

 

Interpreting this data can be accomplished by consulting with Symantec Encryption Support.

Additional Information

EPG-26105 - Global_ID value missing after failed cluster join


For more information on Clustering, see the following article:

154069 - Best Practices: Environmental Requirements for Symantec Encryption Management Server clustering (AKA PGP Server)

153721 - Creating a Cluster with Symantec Encryption Management Server

153476 - How many PGP Servers are supported in a cluster (Symantec Encryption Management Server)?

153412 - Troubleshooting: Symantec Encryption Management Server Clustering

163930 - Using the Manual Join Process to add cluster when the PGP Server backups are large (larger than 5GBs) - Symantec Encryption Management Server

222372 - Encryption Management Server clustering and replication uses network Interface 1