But even after doing that, the way the activity SMSInboundMaster was written only one node which probably would have started last have picked the Inbound SMS account with status Connecting (set by the PegaMKTStartUpAgent during the node startup) and would have solely responsible for fetching SMSs, however if that node goes down other nodes couldn't received the SMS because they would have always found the Inbound SMS account status as Running.
So we have made a small change in SMSInboundMaster activity to pick inbound SMS account with status as Connecting or Running, hence any nodes can receive SMS even if other node goes down. We have tested it and is working fine.
Question is can there be any issues due to this small fix we have done and the fact that we are running this agent in multinode against the prescribed Pega guidelines.
Any opinion will be helpful.
***Edited by Moderator Marissa to update categories***
That is, when an inbound SMS account is picked by agent running on Node1 (Lets Say).
The agent running on second nodeNode2 (Lets say) would still pick the account and check for the status and if it is running it just gracefully exits the InboundConnector activity without doing anything.
Hence, when node 1 goes down also, still agent running on node2 picks the inbound account and checks the status and if it is still running, I believe it gracefully exits the child requestor activity InboundConnector without consuming the inbound messages.
Could you please confirm that when node1 goes down and if node 1 is handling any inbound SMS accounts which are running, would they be handled by node 2 like successfully consume inbound messages for that account or just picks it and checks status and exits the activity gracefully. Please confirm on this.
Because, once the account gets into running status , it enters infinite loop and will break only when user tries to stop it from UI or DB save fails during running.
The OOTB activity SMSInboundMaster wasn't achieving that in pega marketing 7.22. It wasdoing Obj Browse for Inbound SMS account with status Connecting and if found one it was spawning parallel threads for batch execution of activity InbouundConnector. The problem was when you start a node the PegaMKTStartUpAgent temporarily set the status of Inbound SMS account to Connecting and that time one of the node who firsts picks it take control of it for subsequent execution and sets its status to Running. Now here lies the problem, if by chance this node goes down then the status still remains Running and as the agent running on other nodes being still looking for account with status Connecting couldn't find any and hence not able to process SMSes.
This is where we made the change, we changed the Obj-Browse to look for Connecting as well as Running, so now even one the node in control goes down , any other node will still pick up the sms accont with running status and process SMSes.
Please note that in InboundConnector there is no such logic that if Inbound SMS account found in Running then it will skip. It ideally looks for running status, if found in Connecting status it tries to convert it into running and then receive SMSes.
In PM 7.22, InboundConnector activity, if you check second step Obj-Open, we have jump condition where if the inbound account status is either 'Stopped' or 'Running' the activity directly jumps to last step of the activity i.e., Commit and activity will end.
With your local changes in SMSInboundMaster activity, Whenever the other node starts the agent and picks up the inbound account whose status is 'Running', when it reaches 2nd step in InboundConnector activity and accounts of 'Running' status will directly jump from 2nd step to last step of the activity which will skip the processing of SMSes.
For this issue we have done a custom solution where we are allowing the execution to proceed further even if Inbound SMS account is running.
What we are doing is when a particular requestor in a particular node is able to start the infinite loop of polling SMSC server and fetching smses, we are saving that node id and requestor id in the Inbound sms account blob in custom defined properties. After when ever any new requestor in some other node tries to start the same infinite loop it firsts check if node id and requestor id currently aved in the inbound account blob is already active using the below code
Here essentially we are using D_pzNodesInCluster and D_pzRequestors OOTB data pages that pega uses in it's operations landing page. If it finds it to be active then simply exits, otherwise it enters the infinite loop and saves this node and requestor id in sms account blob, so that no new requestors can be formed.
In this way if due to some reason the node on which the sms thread is running crashes, automatically another requestor in some other active node will take trigger SMS fetching thread.
The negative side of this solution is I am using D_pzNodesInCluster and D_pzRequestors data pages which are final and internal data pages, if in future release pega decommision this, then this solution will break, secondly in terms of performance multiple threads will be spawned and then terminated repeatedly, however that shouldn't have big impact.
Hope I was able to explain my solution, let me know your thoughts about this.