On heavy populated systems it was always a bottleneck between NSE registration and processing logic, related to the common statistic table [EventQueue], which is populated and modified by both sides.
In this release (ITMS 8.0) we are removing the absolute (real time) synchronization for this table for both sides – instead, it will be updated on timely basis as well as for some corner cases by specific call to “update query”.
There will be no SQL locks between registration/processing logic sides
Transactions will be shorter and faster
Less consistency problems
Statistics table content will not be real time
Statistics table content will not be real time
EventQueueEntry – processing state column
“ProcessingState” of the event entry in this table will only have two values:
0 – pending
1 - processing
Event Registration without metadata
It will be a simple insert of single row into the “EventQueueEntry” table.
Note: This call will be executed without transaction, but with deadlock retries.
Event Registration with metadata
It will be an insert of two rows:
1 row into “EventQueueEntry”
1 row into “EventQueueEntryMetadata”
Note: Because it’s a multiple row operation - the call will always be under transaction.
Event Pull for processing
The queue pull will be triggered by:
When event is being registered - the SMP API will notify Dispatcher by setting global events, forcing Dispatcher to wake up
When Dispatcher got events from queue, it will try query same queue once again when all events are dispatched in current round.
When Dispatcher has “completed” some events for particular queues, these queues will be queried on this round.
When Dispatcher have a notice, that some of the worker threads are free.
When Dispatcher enters “idle” mode for the first time (no work after last event was processed)
When Dispatcher performs pull, it will:
Not update statistic table
Pull pending events from DB in “batch” mode, by 4x factor of queue processing thread number.
Mark pulled events as “ProcessingState = 1” until completion
Pull will be executed after processing pending event completion
Note: Because it’s a multiple row operation - the call will always be under transaction
Fixing stale queue entries
It is possible (but highly unlikely), than service crash or code bugs will lead to the situation, when events are marked as “processing”, while there is no activity in Dispatcher for them actually running.
To fix the stale queue entries, we have a timed action (adjustable, Core Setting: “EvtQueueFixupMinutes”), which will be performed to find out the queues without any processing activities in Dispatcher.
The default time span to perform action is 10 minutes and can be triggered by these conditions:
One time on the Dispatcher start
Any Dispatcher wake up – at the end it will check for fix-up timeout
Every “idle” cycle will check for fix-up timeout
Note: Fix-up will only perform for particular queue, when queue is not pending any events, no completes are queued and no workers are active.
Dispatcher will complete events in batch mode
Completion will be done in same thread, as “Query Candidates”
How we update statistics
Since Dispatcher’s logic is also depends on the knowledge of what is really “pending”, it is still a good idea to update the statistics in “EventQueue” table.
The recalculation of the queues will be done automatically by all parties – both registration and processing, but not in the same queries, as it was before (spRegister.. / spGetCandidates…).
There will be few situations, when recalculation will occur:
On timely basis – each 5 minutes (adjustable, Core Setting: “EvtQueueReloadTimeout”)
On flood situations – when processing data size is over 250MB (adjustable, Core Setting: “EvtQueueReloadFlowMB”), either by registration logic or processing logic
Note: The main Core service (AeXSvc) will always have reload timeout by 1 minute less, than Core Setting: this will eventually make it an “update master”.
Cross process statistics update
There are quite a number event sources:
Main service (AeXSvc – hosting Receiver and Dispatcher)
IIS (w3wp) – hosting web API for agent to post events
Task Management service (AtrsHost) – some tasks are providing events as a result
Any other custom code in any process, which will call SMP API to register events
All these processes can be a significant source of events, so there should be logic to minimize the pressure for the statistics recalculation.
It is accomplished by the “update master” approach:
When one of the sources detects either update timeout or flood, it will recalculate the statistics in EventQueue table
The result of recalculation will be sent across all parties, which are interested in this information (automatically bound when any of them will register event, for example)
When parties receive statistics, they will update their own local cache and reset “update” & “flood” counters.
Effectively, there should be only one “update master” – AeXSvc, but in some cases any of the parties can trigger the logic if it will become a flooding source.
Subscribing will provide email updates when this Article is updated. Login is required to Subscribe