We have 4 server distributed on two site and we use Pega Call integrated with Avaya 6.3.
Usually all work fine but when we shutdown server in one site, during the time of maximum system load, most of users where complaining about poor performance mainly on the telephony functionalities. Monitoring the performance of Pega servers (cpu, memory, disk, etc) the value of all indicators was good.
We suspect that the bottleneck could be in the number of CTIAACCLinkEvent connector available.
If you are shutting down one server which would include one PegaCTI engine, is your currently capacity designed to support a server worth of users across 3 nodes both on PRPC and Pegacall?
Have you looked through the PegaRules Alert log to ensure there are no other bottle necks?
How does GC / heap look with the additional load?
You should be able to use SMA to review the service packages to see how many are being used. Open SMA and goto :
If all active connections are being used you should see all the instances of CTIAACCLinkEvent0 -10 showing up. If the longest wait / timeout columns have values then you are probably under sized and need to increase the # of instances.
If you do add new instances make small changes like adding 2 new instances since the rule form is shared across all nodes. This would add 2 new instances per JVM ultimately, so 6 total.
Hi, we have increased the number of instances to 20 as suggested, however we are still suffering the very same performance issue. We have noticed that in SMA->Administration->Requestor Poolssome CTIAACCLINEVENTXX "Longest Wait" properties contain values from 1 to 13 and only in one case Timeouts = 1; what does it mean?
PegaRules alert log does not seem to show any performance issue or bottleneck.
With two instances on each of the 4 nodes, all works fine but if 2 of the nodes are powered off, we start having problems. The customer expects that this distributed 4 node configuration on two different geographical areas would allow a high availability service. The customer is executing some fault tolerance tests and in case one site is down the other site should be able to support the whole load without problems.
Could it be a wrong configuration value of WebSphere Thread Pool (webcontainer), Datasource connection pool and WorkManager Pool? Current values are: Max Thread Pool = 50, Max datasource connection pool = 50 and Max Work manager pool = 20. We don't know if these parameters are enough in case we have two nodes instead of four.