Does DLP have a list of "common" words that it ignores?
search cancel

Does DLP have a list of "common" words that it ignores?

book

Article ID: 160359

calendar_today

Updated On:

Products

Data Loss Prevention Enforce

Issue/Introduction

Does Symantec DLP index all the words it is given, including common words like "the" and "and"?  If not, is there a list of words that Symantec DLP ignores?

Environment

Symantec Data Loss Prevention 15.8
Symantec Data Loss Prevention 16.0

Resolution

Symantec DLP does ignore common words like "the". 

Common words occur so frequently that they don't provide any help in detecting protected data.  If anything, detecting on common words would create a large number of false positive matches because, again, they are so common.   Ignoring common words improves the detection results and reduces the time and resources needed to create the indexes.

The common words that Symantec DLP ignores are kept in files in the config/stopwords directory in the DLP installation directory on the Enforce server.

Example 15.8 Directory:
C:\Program Files\Symantec\DataLossPrevention\EnforceServer\15.8.00000\Protect\config\stopwords