Data source query watermarks
search cancel

Data source query watermarks

book

Article ID: 171269

calendar_today

Updated On:

Products

Information Centric Analytics Data Loss Prevention Core Package

Issue/Introduction

What is a watermark for an Integration Wizard (IW) data source query? What considerations should be taken into account when defining a watermark column?

Environment

Release : 6.x

Component : Integration Wizard

Resolution

In Information Centric Analytics (ICA), data source query watermarks determine how far back in time a data source query will look when it executes against a source system.

When configuring a data source query's watermark, you will be presented a list of the columns selected by the query. For example, if the data source query is this:

SELECT Column1, Column2, Column3 FROM Table1;

The list of columns in the Watermark Column drop-down menu will be:

Column1
Column2
Column3

The column used as the watermark column should be either date-based or integer-based. During processing, ICA will set the watermark value based on the last record pulled by using the T-SQL MAX() function, applied against the watermark column. Subsequent staging procedures will only return records greater than the last watermarked date or integer value. Because of the use of an equality operator, it is much more efficient to use one of these two data types.

Data source queries that are configured to use a watermark column do not get truncated and reloaded during the nightly RiskFabric Processing job; instead, new records are appended to the extant records in the staging table. Conversely, when no watermark column is defined for a data source query, ICA will truncate all data in the staging table during processing and will reload the staging table with the unfiltered results from the data source query.