we recently migrated to new PegaAES v7.3.1, observed that lot of CPU critical alerts coming but when we checked same in other monitoring tools or directly in server CPU utilization is very minimal... even increased threshold to 85 fro critical still it bleaching..
someone please explain on what basis AES calculate CPU utilization... is it calculate based on appserver process CPU consumption or will consider overall CPU in box.
I checked in Monitoring node PegaALERT.log file didn't see anything that correlates with high CPU utilization. Please understand all monitoring tools included in the servers am not seeing CPU hikes. Am looking how AES Calculates utilization.
Pega Global Customer Support has made public, these 3 tools which could help you with diagnosing the root cause of a problem in your Pega application. These are the same tools Pega engineers use to help you troubleshoot problems with applications developed using Pega platform.
Pega Platform does not actually have any alerts for CPU utilization. In the platform, the management daemon runs every two minutes and queries Java runtime environment for (a) total cpu usage of the process [in seconds] (b) number of CPU's available. Management daemon tracks process cpu usage and last time check. The daemon calculates CPU usage by
(current process CPU usage in seconds - previous process CPU usage in seconds) / (time elapsed since last measurement) / (number of processors) * 100%
Example - let's say at 12:00:00 the daemon observed total CPU usage of 512.25 seconds and 4 cpu's available. At 12:02:00 daemon observed CPU usage of 622.25 seconds. Daemon would calculate
(622.25 cpu secs - 512.25 cpu secs) / 120 clock seconds / 4 cpu * 100% --> 25% CPU -- 120 cpu seconds used in 120 seconds on a box that should be able to provide 4 cpu seconds per clock second.
Management daemon sends a HLTH0001 message to AES / PDC, which then evaluates if that CPU usage is to be considered critical, warning or normal. In AES you can override those thresholds at node, system or global level.
So - check the data. If Java runtime and operating system are not agreeing, see if your JVM and OS are properly hotfixed
CPU utilization as reported to AES is accurate to what the Java runtime reports.
What does the data in AES show?
If your monitoring tools report maximum CPU utilization of 2.5%, I question the monitoring tools. I'm not saying that the system is running high CPU, but it's equally unlikely that it runs so low all the time.
What is the hardware / virtual hardware platform? What operating system / virtual container? How many "CPU's" does the operating system report? How are your monitoring tools gathering data?
Do you have operating system access? Can you use ps or other commands to check the start-to-now cpu utilization (in seconds) of the java process and compare it to clock elapsed?
It's not urgent that the data sent to AES be fully accurate - or that you even let AES monitor CPU utilization [you can edit the decision table from the UI to always return "normal"] but the major discrepancy between what java and your other tools report is concerning.