Posted: 12 Aug 2015 0:16 EDT Last activity: 29 Oct 2015 9:09 EDT
How Pega PRPC ensures that Queued items to be processed by agent will be processed by only one agent in multinode environment.
Here is a scenario on agent…
We have created one standard agent and corresponding agent activity is processing incoming email and creating work object in application. Agent rule is to process at max 5 items in one go.
My production environment is multimode system with 4 nodes. Above agent rule has created 4 schedule instances. All four instances are ready to process incoming email.
I would like to know at this point is that how pega agents select items from Queue? I believe this is based on pyminimumdatetimeforprocessing property. But question here is that if one instance of agent has selected a particular set items from Queue then will same items will be picked by other instances? How other instances of agent will identify that a particular item is already selected for processing?
Example: let's say there are 25 items present in Queue. One instance of agent has selected first 5 items from queue for processing. Now on what bases second instance of agent will select items from Queue? Will it also select first 5 items or next 5 items i.e from 6 to 10? if second instance is selecting first 5 items then it will be resource contentions and this could be performance issue.
Same is also applicable for SLA agents. How such standard agents are working in multimode environment. Is schedule agent is marking items when selecting from queue so that other agents don’t select those items?
Queue item will be marked when one agent picks it up for processing.
If you are using AQM, PRPC manages the statuses of queue items. In this case, agent can pick only those queue items which has status 'New'. When agent picks a queue item up for processing, it's status gets changed to 'Now-Processing.'
If you are not using AQM, agent processing activity should take care of modifying the status of queue item.
Standard mode Agent uses Auto-Queue-Management(AQM) feature. All the entries for this agent stored in pr_sys_queues tables. In this table there is a column named pyItemStatus. Based on this status items will be picked from queue.
From my learnings,For example if we have 20 items in queue and we have 4 nodes.
In node1 agent acquires the lock on 5 items , node1 will process this 5 work items until and unless it gets failed. Node2 will process next 5 items i.e 6 to 10 as node1 has lock on 1-5 items.
You observation is correct then in this case resources contention will happen.
If first instance has picked first 5 items and prpc start to work on these items and will change the status of 1st item to 'Now-Processing'; rest four will have New status. If same time second pega instance is looking into queue then it will select items from 2 to 6 (based on New status). If this is expected behaviour then resource contention is happening and system is going to be inefficient because of same items have been identified by two agent for processing however only one agent can process.
Apart from pyItemStatus column there is one more column named pxProcessingNodeId. By default this value is NULL. If node1 starts processing the item(let's say first item) the value should be filled with Node1 ID.If node2 starts processing the item as first item locked with node1, node2 gets the 2nd item and the column value changed to Node2 ID and the processing continues on to node3 and node4 etc.
I am using prpc 6.1 sp2 and simulated the item selection by standard agent.
1> Disable standard agent
2> Populated Queue
3>Enable only agent to check the item selection from queue
4> Run the database query to check the status
I have doubt here. As per your suggestions agent should pick first five items as per pega defined order( I have not listed results as per order) and should mark all five items so that other agent instance don't select same items.
------------Results before enabling the standard agent------
My understading on free from resource contention is that after first agent instance execution, second instance should select items from 6 to 10 from the queue.
Yes, You are right. Node id is also populated when processing individual items. In this way second instance is selecting items from the list of first instance agent. This is resource contention issue right?
But one question Rantjith is that An efficient system is that if Agent 2 is selecting items from 6 to 10 and this will be absolutely free from resource contention. Ideally at present agent 2 is also selecting items from 1 to 5 and checking the status of item 1 which is currently locked and processing by agent 1. So agent instance 2 is wasting time and checking the status of items already selected by agent instance1.
Scenario: Basically agent 1 is processing items , same time agent 2 check the status of item 1(which is locked by agent 1) then it will skip and select row 2 to process. If agent one is done with one items then it will go for next item which is locked by agent 2 then it will skip and select items 3.
So basically both agents are wasting time to checking status of each items if locked then skipping item. Same processing is also happens for advance agent in multinode system that is why advance agents are recommended to enable only on one node.
How to design in pega that agent instance 2 select items from items 6 to 10?
As per my knowledge, if agent on Node1 process 1st item NodeID=Node1(Let's assume 10 records) in pr_sys_queues table whereas agent on node2 queries database then NodeID=node2(In this case only 9 records being displayed to agent on node2 as node1 already acquired a lock on 1st item) and process continues for remaining nodes. In this way it is free from resource contention.I think we can't guarantee that which items are being processed by which agent. It is based on resource availability. Standard mode agent uses AQM(Auto-Queue-Management) feature . Processing of items are being taken care automatically with the help of AQM feature.
How to design in pega that agent instance 2 select items from items 6 to 10?
I guess we couldn't acheive this using standard agent with AQM. You can write your own logic to get it done with out AQM using advanced/standard mode.
The setting SLARefreshEachInteration can be used to prevent additional inter-agent contention. If this setting is true, then each time an SLA agent successfully processes an assignment, it refreshes the retrieved item list. So for example, if the SLAUnitsToRetrieveis set to 3, and the SLAUnitsToProcess is set to 10, the system will retrieve 3 assignments and try to process one of them. The quantity of 3 to retrieve is chosen to give a good chance that at least one of the assignments won’t be locked by another agent or a user, and will be able to be processed. If only one assignment is retrieved, there could be some problem with it that prevents it from being processed, and the retrieval process would have to be done again, wasting resources.
After the agent has successfully processed one of the three assignments, it “refreshes” its list by going back to the full queue of assignments to retrieve another three. It is possible that if the very first assignment was processed, and no other agents have processed any assignments in the meantime, that the 2nd and 3rd assignments picked up in the first retrieve will be picked up again; however, it is much more likely that other agents from the other nodes in the system have run, so it will actually pick up completely new assignments. It processes one of these new three assignments successfully, and then refreshes again, following this procedure until it has successfully processed 10 assignments. At this point, it stops, and waits for the next agent interval.
The refreshing procedure means that the agent has more up-to-date information on the assignments to process. For example, the Node A agent may pick up assignments 1, 2, and 3. It successfully processes #1, and then refreshes. In the meantime, the Node B agent has also picked up assignments 1, 2, and 3. It can’t process #1 (as it is locked for processing by Node A), so it processes #2, and then refreshes. When Node A refreshes, it now picks up #3, #4, and #5, because Node B processed #2. This means that Node A doesn’t waste time trying to process #2, which was locked and then completed by Node B.
I have been studying about multi node agent setting and came across this thread. Very interesting and informative thread. Some questions that came to my mind after reading few posts.
The settings mentioned above should be changed depending upon the environment and node setting. And i checked my system and found they are system settings and not Dynamic System Settings which makes it even more difficult to change. My question , how do i change the values of these setting. We know for a fact that these settings can be changed in individual applications ruleset. So if i change thse settings in my app ruleset, how will they get picked up at run. Given that, to fetch value of setting setting getRuleSystemSetting function, where it might have been used would require ruleset name and setting name as input. Appreciate quick response.
Thanks Gaurav. I found the above activity and i am guessing that this actiivty is called internally within Agent SLA activity which provides the 3 parameters required for processing.So at run time if i have to confirm that my chnages to system settings are picked, i would need to access the values separately i believe. Please correct me if am wrong.
I believe SLARefreshEachInteration,SLAUnitsToRetrieveis and SLAUnitsToProcess settings are related to OOTB SLA process. How can we plug in the same concept for custom agents defined in our application?