You can run 'top -H' in linux to find the threads consuming high CPU. If it is a java thread (or multiple) from the JVM, you can then take a thread dump from the JVM and see if you can map the native thread id back to the thread id in the result of top -H. You will be able to identify the call stack of the thread(s) consuming CPU.
Every two minutes the 'management daemon' running on a node takes a snapshot of requestor count, agent count, heap usage and CPU usage reported by the application server and sends that to AES. It is quite possible that CPU usage is below 40% on average but might have spiked to 100% at the time when the daemon had taken the snapshot.
What are you using to measure and record server CPU? What is the data that drives the assertion "never crossed 40%"? How often / how granular is the CPU usage data that you are using for comparison? How often are you sampling CPU load for your reporting?
What is the nature of your system architecture? Is it physical or virtual? How many virtual CPU's are available to the operating system hosting your application server / Pega application?
Typical things that may contribute to 'spikiness' that are outside the application are garbage collection and JIT compilation. Suggest you review your alerts and verbose GC files to see if there is anything interesting happening around times that your system is reporting 100% CPU usage?
I've contacted a SME to look into how the management daemon measures CPU.
What Java version are you using? What operating system? What is your hardware topology (physical and virtual)? Are you using docker or any other secondary virtualization?
I do know that in Linux 'top' command, CPU is measured relative to a single CPU and a multi-threaded process such as java can exceed 100%. We will need to see what Java API the management daemon uses, then cross-check your specific JVM implementation to see if it returns CPU utilization in terms of a single processor or the entire server.
if indeed we do find that there are instances where Java can legitimately be reporting CPU usage well in excess of 100% we'll need to be sure that AES can handle the bigger threshold.