This article contains recommendations and guidelines for configuring Discover Servers to scan Microsoft SharePoint repositories efficiently.
Symantec recommends the following settings for each Discover server.
crawler.threadpoolsize = 30 (default value - found in crawler.properties file) where crawler.threadpoolsize represents the number maximum number of crawler threads. Note: Use the recommended value only if your setup conforms to the recommended hardware configuration in the table below.
MessageChain.NumChains = 1 * No. of CPU cores (if the cores are hyper-threaded, then 2*no. of cores) where MessageChains.NumChain represents the number of messages in parallel that the FileReader will process.
MessageChain.CacheSize = 2 * MessageChain.NumChains where MessageChain.CacheSize represents the size of the Detection (MessageChain) queue.
FileReader.MaxFileSystemCrawlerMemory = (crawler.threadpoolsize + MessageChain.NumChains + MessageChain.CacheSize) * FileReader.MaxFileSize where FileReader.MaxFileSystemCrawlerMemory represents the total run-time memory for all running threads.
BoxMonitor.FileReaderMemory = 4 * FileReader.MaxFileSystemCrawlerMemory where BoxMonitor.FileReaderMemory represents a dynamic memory pool holding all run-time data about the FileReader. This value should be less than the assigned system memory.
crawler.grid.follower.queuesize = 2 * crawler.threadpoolsize where crawler.grid.follower.queuesize represents the maximum number of files for detection that can be added to the grid queue. This setting is applicable to grid scans only.
crawler.grid.queuesize.multiplier = 4 * crawler.threadpoolsize where crawler.grid.queuesize.multiplier represents the grid scan request queue size per detection server. This setting is applicable to grid scans only.
You can use the attached spreadsheet to calculate the recommended values for these parameters.
Note: The grid scanning feature for Microsoft SharePoint Server target is available in Symantec Data Loss Prevention from version 15.1 onwards.
Scan target configuration guidelines
Symantec recommends the following guidelines for configuring SharePoint scan targets:
As much as possible, divide the Microsoft SharePoint Site Collections/WebApps uniquely amongst the deployed Discover servers.
To avoid scanning unnecessary files, configure filters based on the expected items to be scanned on the basis of the File Type, Date Modified, and file size attributes.
Scan mode guidelines:
When you select Grid as the scan mode, ensure that the grid scanning-specific tuning parameters are configured on all of the Discover servers in the grid.
To configure a grid scan, you must select at least 2 Discover servers.
To initialize a grid scan, at least 2 of the selected Discover servers must be available.
Summary of configuration recommendations
Be aware that:
Scan throughput is affected by the available network bandwidth, number of CPU cores, and the total system memory of the participating Discover servers.
Scan throughput is affected by the complexity of the configured policies.
Scan throughput us affected by the caching of scanned content on SharePoint servers.
A higher active user count on a particular SharePoint server could reduce scan performance.
Scan performance is affected by the distances between the participating discover servers and the SharePoint server scanned.
In Grid scan mode, make sure Microsoft SharePoint Servers are configured to allow concurrent requests.
Recommended Configuration (Single Server scan)
Recommended Configuration (Grid scan mode)
Number of CP cores
For more information, refer to the grid scanning performance guidelines in the Symantec Data Loss Prevention 15.1 Administration Guide.