Agent Failure status not captured correctly in my activity for the class "Data-Agent-Queue"

Question

jagadeeshm2569

Member since 2015

2 posts

cisco

Posted: Sep 24, 2018

Last activity: Dec 5, 2018

Posted: 24 Sep 2018 15:19 EDT
Last activity: 5 Dec 2018 13:36 EST

Closed

Agent Failure status not captured correctly in my activity for the class "Data-Agent-Queue"

Report

Hi,

We have written an automation agent which will check if the agent is down in the current node. If its down, it will restart the agent. We are using "Data-Agent-Queue" class to get the list of all agents for the specific ruleset of that node. To get the details of each agent of that ruleset, we are using "Embed-Rule-FutureQueue" class. This way if .pyFatalMessage has an error, we are restarting the agent. However, for some cases we have observed that though in SMA there is fatal message for that agent, the property .pyFatalMessage in tracer/clipboard is null. For these cases, we are unable to validate agent status.

I checked for a particular application where there are 2 specific agents that are down. As part of my activity tracing, one of the agent has pyfatalmessage property with fatal message whereas the other agent has null value in pyfatalmessage even though in SMA it is has some error message.My analysis is that the one which has null value in pyfatalmessage has 60,000 characters in “exception Info” of SMA whereas the other one which has value in pyfatalmessage has around 8000 characters in "exception Info" of SMA.

Not sure if the length has to do with it.

***Edited by Moderator Marissa to update SR Details; update platform capabilities***

To see attachments, please log in.

Pega Platform

Data Integration

Support Case Exists

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 5 years ago

Posted: 28 Sep 2018 14:14 EDT

MikeTownsend_GCS replied to jagadeeshm2569

Report

Hello,

When you check the pyFatalMessage value, is it via an exposed column or do you open the object into the clipboard and get the blob data? I would not be surprised if the exposed column has a limit in the DB and your 60 thousand characters exceeds it. I would be surprised if it was not present in the blob, though. Presumably that's why you can see the stack trace in SMA. If the property has no value on the clipboard when you do an obj-open-by-handle, then I'd say you are ultimately looking at the wrong property, but having the object to review should hopefully help you track down the one you need.

Thanks,
Mike

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Posted: 1 Oct 2018 12:08 EDT

jagadeeshm2569

cisco

replied to MikeTownsend_GCS

Report

Hi Mike,

Thanks for taking time to reply.

Let me explain further and add my latest observation as well.

We are using Obj-open-by-handle to get the data, so we are not using exposed columns. Also today, I had been monitoring two of our QA regions. One of the agents has failed in both the regions. My activity is able to report the agent failure in one of the regions since .pyFatalMessage has a "value" in that property. For the other region, since the .pyfatalmessage is blank, my activity is skipping that failure. Attaching the trace screen shot.The screenshot is trace of same agent from different QA regions.

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Posted: 1 Oct 2018 14:04 EDT

MikeTownsend_GCS replied to jagadeeshm2569

Report

Do both environments have the same error in the log or are they failing for different reasons, one that might qualify as a fatal message and another that may not? I don't know exactly how the system populates that property, but presumably not every message is "fatal" and there could be other scenarios where the agent doesn't process the work.

Thanks,
Mike

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Updated: 5 years ago

Posted: 4 Oct 2018 16:03 EDT
Updated: 4 Oct 2018 16:02 EDT

Jagdish.K

Capgemini

replied to MikeTownsend_GCS

Report

Hi Mike,

Its the same error message in both the regions.

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Posted: 11 Oct 2018 11:38 EDT

Jagdish.K

Capgemini

replied to jagadeeshm2569

Report

Hi Mike,

There is one more agent which is not captured.The error message is shown in SMA, but not in the pyfatalamessage property. Attaching the error message log.

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Posted: 29 Nov 2018 13:15 EST

jagadeeshm2569

cisco

replied to jagadeeshm2569

Report

Hi Manoj,

As discussed, I had followed your suggestion. PFA my observations on the issue.Kindly let me know the next steps.

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Posted: 30 Nov 2018 5:59 EST

yerrm

PEGA

replied to jagadeeshm2569

Report

Hi Jagadish,

How did you get the node ids ?

Hi Jagadish,

How did you get the node ids ? Did you use clusterAPI ?

Below is the sample code to get the list of nodes in the cluster :

Map<String, String> clusterInfo = pega.getSystemOperationsProvider().getClusterManagementAPI().getClusterInformation();

for (Map.Entry<String, String> nodeInfo : clusterInfo.entrySet()){

oLog.infoForced("Node ID : "+nodeInfo.getKey());

}

Sample output :

Node ID : Proprietary information hidden_envhyd82-web-3

Node ID : Proprietary information hidden_envhyd82-web-1

Node ID : Proprietary information hidden_envhyd82-web-2

Node ID : Proprietary information hidden_envhyd82-search-137

Node ID : Proprietary information hidden_envhyd82-search-138

Below is the sample code to get the agent status which is in the current node.

String nodeID = PegaRULES.getEngine().getNodeUniqueID();

Map<String, Object> agentDetails = tools.getAgentUtils().getAgentDetailsForNode("Pega-RULES", "SystemPulse", nodeID);

oLog.infoForced("AgentDetais:"+agentDetails.get("agentStatus"));

If you are still facing issue, let’s have screen share and fix the issue.

Thanks,

Manoj Yerra

Show Less

To see attachments, please log in.

Like (0)

Posted: 5 years ago

Posted: 30 Nov 2018 15:59 EST

jagadeeshm2569

cisco

replied to yerrm

Report

Hi Manoj,

Thanks for the response. However, I tried the code. I get a compilation error message on save of the activity. There is something I could be missing. Can we have a 1-1 and close this ? I will be available 9 AM - 9 PM EST. Kindly do let me know your time so that we can catch up.

Regards,

Jagdish

To see attachments, please log in.