Posted: 24 Sep 2018 15:19 EDT Last activity: 5 Dec 2018 13:36 EST
Agent Failure status not captured correctly in my activity for the class "Data-Agent-Queue"
We have written an automation agent which will check if the agent is down in the current node. If its down, it will restart the agent. We are using "Data-Agent-Queue" class to get the list of all agents for the specific ruleset of that node. To get the details of each agent of that ruleset, we are using "Embed-Rule-FutureQueue" class. This way if .pyFatalMessage has an error, we are restarting the agent. However, for some cases we have observed that though in SMA there is fatal message for that agent, the property .pyFatalMessage in tracer/clipboard is null. For these cases, we are unable to validate agent status.
I checked for a particular application where there are 2 specific agents that are down. As part of my activity tracing, one of the agent has pyfatalmessage property with fatal message whereas the other agent has null value in pyfatalmessage even though in SMA it is has some error message.My analysis is that the one which has null value in pyfatalmessage has 60,000 characters in “exception Info” of SMA whereas the other one which has value in pyfatalmessage has around 8000 characters in "exception Info" of SMA.
Not sure if the length has to do with it.
***Edited by Moderator Marissa to update SR Details; update platform capabilities***
When you check the pyFatalMessage value, is it via an exposed column or do you open the object into the clipboard and get the blob data? I would not be surprised if the exposed column has a limit in the DB and your 60 thousand characters exceeds it. I would be surprised if it was not present in the blob, though. Presumably that's why you can see the stack trace in SMA. If the property has no value on the clipboard when you do an obj-open-by-handle, then I'd say you are ultimately looking at the wrong property, but having the object to review should hopefully help you track down the one you need.
Let me explain further and add my latest observation as well.
We are using Obj-open-by-handle to get the data, so we are not using exposed columns. Also today, I had been monitoring two of our QA regions. One of the agents has failed in both the regions. My activity is able to report the agent failure in one of the regions since .pyFatalMessage has a "value" in that property. For the other region, since the .pyfatalmessage is blank, my activity is skipping that failure. Attaching the trace screen shot.The screenshot is trace of same agent from different QA regions.
Do both environments have the same error in the log or are they failing for different reasons, one that might qualify as a fatal message and another that may not? I don't know exactly how the system populates that property, but presumably not every message is "fatal" and there could be other scenarios where the agent doesn't process the work.
Posted: 2 years ago
Updated: 2 years ago
Posted: 4 Oct 2018 16:03 EDT Updated: 4 Oct 2018 16:02 EDT
Thanks for the response. However, I tried the code. I get a compilation error message on save of the activity. There is something I could be missing. Can we have a 1-1 and close this ? I will be available 9 AM - 9 PM EST. Kindly do let me know your time so that we can catch up.