DLP Best practices for using Data and Document profiles
Last Updated May 16, 2019
Exact Data Matching (EDM) is the most accurate form of detection. It is also the most complex to set up and maintain. To ensure that your EDM policies are as accurate as possible, consider the recommendations in this document when you are implementing your EDM profiles and policies.
Indexed Document Matching (IDM) is designed to protect document content and images. IDM relies on an index of fingerprinted documents to perform partial and derivative text-based content matching. In addition, you can also use IDM to match indexed documents exactly based on their binary stamp, including not only text-based documents but also graphics and media files
Because of the broad range of matching supported by IDM, you should consider the best practices in this document to implement IDM policies that accurately match the data you want to protect.
EDM Best Practices:
Ensure that the data source file contains at least one column of unique data.
Eliminate duplicate rows and blank columns before indexing.
To reduce false positives, avoid single characters, quotes, abbreviations, numeric fields with less than 5 digits, and dates
Understand multi-token indexing and clean up as necessary.
Use the pipe (|) character to delimit columns in your data source.
Review an example cleansed data source file.
Map data source column to system fields to leverage validation during indexing.
Leverage EDM policy templates whenever possible.
Include the column headers as the first row of the sourced data source file.
Check the system alerts to tune Exact Data Profiles.
Use stopwords to exclude common words from matching.
Automate profile updates with scheduled indexing.
Match on two or three columns in an EDM rule.
Leverage exception tuples to avoid false positives.
Use a where clause to detect records that meet a specific criteria.
Use the minimum matches field to fine tune EDM rules.
Consider using Data Identifiers in combination with EDM rules.
Include an email address field in the Exact Data Profile for profiled DGM.
Use profiled DGM for Network Prevent for Web identity detection.
IDM policy best practices:
Reindex IDM profiles after upgrade.
Do not compress documents whose content you want to fingerprint.
Prefer partial matching over exact matching on the DLP Agent.
Do not index text-based doecuments without content.
Be aware of the limitations of exact matching.
Use white listing to exclude partial file contents from matching and reduce false positives.
Filter non-critical documents from indexing to reduce false positives.
Change the index max size to index more than 1,000,000 documents.
Use remote indexing for large document sets.
Use scheduled indexing to automate profile updates.
Use multiple IDM rules in parallel to establish and tune match thresholds.
Data and Document Profiles in the Cloud Best Practices:
Best Practices for these Profiles in the cloud are the same as on-premise detection servers except that with Cloud Detectors all two-tier indexes have to be perfect (at least for the first profile uploaded to a new Detector). This also includes Active Directory indexes, which are stored as an EDM Profile.
Subscribing will provide email updates when this Article is updated. Login is required to Subscribe