EDM tuning and the commonality threshold
search cancel

EDM tuning and the commonality threshold

book

Article ID: 162066

calendar_today

Updated On:

Products

Data Loss Prevention Enforce

Issue/Introduction

The commonality threshold is defined by the term_commonality_threshold setting
in Indexer.properties. This parameter is used in determining which terms from
the index are common and which are uncommon. An EDM match has at least
one uncommon term.
 

Resolution

We recommend that you do not change this parameter. If you must change
term_commonality_threshold, make sure the value is no more than 100 or it is
large enough so that none of your terms are common terms. A commonality
threshold above 100 is is a sign of low entropy index. If you need most, or all
of the terms in the index to be uncommon, raise the commonality threshold above the
appearance count of the most common term.

For example, when the most common term appears 50k times in your index, if you must change the commonality threshold
above 100, then make it 51k. In the above mentioned case, setting the value to 10k
(in between 100 and 51k)  reduces  your throughput substantially by five times,
or maybe more.

 

Additional Information

DLP Best practices for using Data and Document profiles (broadcom.com)