Question
Pega Adaptive Snapshot
Hi Team,
I just wanted to know the internal architecture of adaptive models. Pega is currently uploading all the responses into adaptive models. It is creating a new snapshot record in the Data-Decision-ADM-ModelSnapshot for every single day. What is the purpose of this class? Why don't it create a new entry after single run or at least after every single update? Is there any reason for this?
Problem statement:
We have adaptive models which are getting trained every day based on the web responses. We are using a delay learning architecture. Pega doesn't store the individual adaptive responses into adaptive models. It calculates the outcomes using ML and add one record as a snapshot entry(Based on my understanding). What happens if the ADMSnapshot agent is down on one particular day? How can we recover the responses past responses? How do we know which responses it has processed till that time? Is there any way to do a disaster recovery? Thanks in advance.
Regards,
Nizam
Hi Nizam, sorry for the delay.
So reporting - taking snapshots to enable the model reports you view in the Analytics Center - is separate from the process of training the models with responses.
With the latter, from 7.3 onwards, we store all responses that arrive and then, when we've received a model rule-specific number of them, an ADM Service node will apply all those responses to the relevant model, increasing its learning.
When this is complete, the entire new state of the model is saved to the SQL database in data.pr_data_adm_factory. We also build a smaller scoring model, which is distributed around the cluster and used to make decisions (i.e. when a model rule is executed in a Make Decision strategy).
Reporting does not interfere with the learning process above. The snapshot agent will look at the factory for each model and create a snapshot of the information: the model state at that moment in time. This is ONLY used in the reports of the Analytics Center, not for learning or any data recovery.
So...
It is creating a new snapshot record in the Data-Decision-ADM-ModelSnapshot for every single day. What is the purpose of this class?
The class is the data model underlying each row of the table on the Model Overview page of the Analytics Center (the first page you see in the Monitoring tab when you open an Adaptive Model there). It stores model-level information so users can see how well their model is performing.
Why don't it create a new entry after single run or at least after every single update? Is there any reason for this?
Both of these are great ideas and are already on our product roadmap. Having the snapshot agent run at arbitrary intervals is leftover from releases prior to 7.3, when models were updated in memory on a separate server as each response arrived, meaning there was no 'best' time to take snapshots.
What happens if the ADMSnapshot agent is down on one particular day?
The training and use of your models to make decisions will not be affected. If you are storing historical report data (configured via 'Edit Settings' on the ADM Service page), then you will be missing report data for that particular day. If you've chosen just to store the most recent reporting snapshot, your reports will not be updated with that day - you'll just continue to see the previous day's info.
How can we recover the past responses?
Responses used for learning will only be lost if there is no update of a model within 48 hours, and they cannot be recovered. If you're referring to the scenario above, where the snapshot agent misses a day, this will not result in responses being lost. However, you cannot 'roll back' the state of your model factory and take a snapshot of the missed day for reporting purposes, so there is no way to 'backfill' the reporting data you've missed.
How do we know which responses it has processed till that time?
The model instances under a rule are usually differentiated by their combination of Issue, Group, Name, Direction and Channel values. From 7.3, you can see how many positive and negative responses each of these model instances has processed on both the Adaptive Model management page within Designer Studio and in the model report in the Analytics Center.
Is there any way to do a disaster recovery?
Responses and scoring models will be replicated across up to 3 of your DDS nodes by default, so multiple of these nodes would have to have their storage destroyed to lose all replicas of that data. However, the factories in the SQL database are the ultimate source of truth for Adaptive Models, and are not replicated, so these should be backed up as with any other data in the Pega DB.
The worst case would be if you were unable to bring up an ADM Service node for more than 48 hours: any unprocessed responses from before then would then be lost when a service node does come up. If this is deemed unacceptable, you could backup your DDS data and replay the responses in another way at a later date, although this is currently unsupported OOTB.
Let me know if any of this needs clarifying!
Thanks!
Ben