Could you please review the attached AES database related AWR and PegaRules and PegaAlert logs and provide subsequent recommendations.
AES 731 and Monitored nodes version 731.
Following errors and SQL queries are observed high in number :
From PegaRules log :
1) 2018-12-04 00:01:02,646 [ PegaRULES-Batch-3] [ STANDARD] [ ] [ AES:07.30] ( internal.mgmt.Executable) ERROR Rule-Connect-SOAP.PegaAES-Data-RSSnapshot.GetRuleSetSnapshot - Exception
com.pega.pegarules.pub.services.ResourceUnavailableException: SOAP service failed
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2)2018-12-04 00:30:17,662 [ttp-nio-8080-exec-62] [ STANDARD] [ ] [ AES:07.30] ( internal.util.PRParseUtils) ERROR va10n40604|10.100.106.74|SOAP|PegaAES|Events|logAlert|A2R1EOW0OWA9AFZ6S082F2EH1GSH5KFVM - XML parsing failed
com.pega.pegarules.pub.PRException: Failed to parse XML. Error: Content is not allowed in prolog.
PegaAlert Logs :
1) Database operation took more than the threshold of 500 ms: 58,857 ms SQL: DELETE FROM DEVKM_PEGADATA.pegaam_alert "PC0" WHERE ( ( "PC0"."CLUSTERNAME" = ? ) AND ( "PC0"."GENERATEDDATETIME" ? ) ) AND "PC0"."PXOBJCLASS" = ?
Last Input : PegaAES-.CallcleanupAESAgent
2)Database operation took more than the threshold of 500 ms: 8,406 ms SQL: SELECT COUNT(ASTERISK) AS "pySummaryCount(1)" , MIN("PC0"."GENERATEDDATETIME") AS "pySummaryDateTime(1)" , MAX("PC0"."GENERATEDDATETIME") AS "pySummaryDateTime(2)" FROM DEVKM_PEGADATA.pegaam_exception "PC0" WHERE ( "PC0"."PROBLEMCORRELATION" = ? ) AND "PC0"."PXOBJCLASS" = ?
a- you have enabled features to SOAP back to monitored nodes to get rule information. Be sure you have valid URL's in class pegaaes-data-nodes and debug the "call back" feature by enabling debugging and running tracer from AES Enterprise tab - try fetching requestor list
b- xml parsing failed. does the exception include a stack trace with activity's or is blowing up at the service layer? Do you see it often or randomly / occasionally? Error message suggests that a monitored node has sent an invalid soap / xml method. Enable http.wire.content and verify data sent to AES
c- deleting old data from pegaam_alert is typically very slow. How many rows do you have? Is table storage being appropriately cleaned / vacumed / compacted to reclaim space from deleted rows? Is table indexed using generated datetime?
d- for report summarizing count, first and last event of an exception correlation, is the report slow all the time or for a specific exception only? can you get a count by correlation and report the "top ten" exceptions?