What is the best practice for monitoring Pega Agents?
On March 11th, we received a PEGA0010 alert in the logs, and subsequently a PDC PEGA0010 alert for "ProcessFlowDependencies". We also received several dozen other PEGA0010 alerts on the same day. How do you recommend we fix these types of issues?
I have heard that the platform tries to restart the agents when they go down but this does not seem to be completely failproof. We have tried using PDC for agent monitoring but with PEGA0010 it is not always reliable since many of the numerous alerts are false positives. We have tried extracting agent status through a REST service which works to some degree. However, this approach is cumbersome to maintain since you need to track the agent list outside of the app.
Do you have any suggestions to alleviate this problem?
***Edited by Moderator Marissa to update SR Details***
Pega0010 means that the Agent Processing is disabled. It you are seeing instances where Pega0010 Alert is generated when the said Agent processing continues then that is a Product Defect and you should submit a Bug with evidence.
Thanks for the info. I have submitted INC-99470 for one instance where an agent seems to be enabled but a PEGA0010 was reported. That being said, my question is more generally around what actions to take when a PEGA0010 alert is received. After events like production maintenance, we have sometimes seen large volumes of PEGA0010 alerts thrown across our fleet of applications. Using the new PDC API we are maintaining a repository of agent related errors that are reported out of PDC for addressing later. If a PEGA0010 alert truly indicates that an agent is down, do you recommend utilizing the PEGA API to restart all those agents or do you recommend restarting the application node? I imagine that there are nuances to any fleetwide approach but I was curious what the best practices are.