FAQ - The EDM Indexing Process
search cancel

FAQ - The EDM Indexing Process

book

Article ID: 160187

calendar_today

Updated On:

Products

Data Loss Prevention Endpoint Prevent Data Loss Prevention Network Monitor Data Loss Prevention Network Prevent for Email Data Loss Prevention Enforce Data Loss Prevention Network Discover Data Loss Prevention Network Protect Data Loss Prevention Endpoint Discover Data Loss Prevention

Issue/Introduction

Questions about the EDM indexing process, such as:

What files are created during the EDM indexing process? 

Where are they created? 

How much space do they take up? 

How much memory is used in creating them?

Resolution

What files are created during the EDM indexing process?

Two kinds of files are created during the EDM indexing process:  one .idx file and one or more .rdx files.

  • The .idx file is temporary. It's removed once the indexing completes.

Where are they created?

They are stored in the index directory.

Windows: \ProgramData\Symantec\DataLossPrevention\ServerPlatformCommon\15.8.00000\index

Linux: /var/Symantec/DataLossPrevention/ServerPlatformCommon/15.8.00000/index

The original data remains in the datafiles directory during the indexing process.

Once the indexing completes the original data file is removed.

    • It is recommended to retain the originals as a roll-back/DR strategy, where feasible and applicable.

How much space do they take up?

.idx

The calculation for the size of the .idx file is:  

.idx file size = # rows * # columns * 25

This file takes up disk space only for the duration of the indexing process.

  • The .idx file is temporary. It's removed once the indexing completes.

.rdx

The .rdx files can get very large but in general they are smaller than the .idx file. 

Symantec DLP may need to create several of these files due to the 32-bit platform limitations. 

The file size limit is defined by the max_loaded_index_memory setting in the Indexer.properties on Enforce.

The theoretical maximum is 2GB for Windows and 3GB for Linux. The difference is due to how 4GB of the 32-bit process virtual address space is split between kernel and user mode.  The Symantec DLP default install does not take advantage of this. 

However, the indexer.properties file can be changed to take advantage of an additional 1 GB available on Linux. 

When the EDM profile index doesn't fit in a single .rdx file, the .rdx files are enumerated with 0-n values.

For example: DataSource.#.#.rdx.0, DataSource.#.#.rdx.1, DataSource.#.#.rdx.2, etc. 

How much memory do they use?

EDMs are created in two phases. 

Phase 1 (.idx file produced) does not use much memory. 

Phase 2 (.rdx files produced) is more memory intensive. 

This second phase of indexing needs about 1.9 GB of RAM in addition to what Enforce normally requires.

Additional information:

Phase 1 reads in the input file sequentially, finds the patterns, and normalizes the data. 

The hashed files (*.rdx ) are also stored in the same index directory then later replicated to a similar path on all Symantec Data Loss Prevention Detection servers.