Greetings.. We are on Pega 7.3.1, Database PostGresSQL and application server as Tomcat on Amazon AWS, we are trying to shut down RDS by keeping EC2 instances active and when RDS is started and actively available, Pega application is not being recovered. A lot of similar errors as below are cropped into logs
2019-06-19 15:40:07,790 [egaRULES-MasterAgent] [ STANDARD] [ ] [ ] (luster.external.SystemNodesDAO) ERROR - Failed to update System-Status-Node instance: Database-General Problem encountered when getting connection for database pegarules 0 08003 This connection has been closed.
DatabaseException caused by prior exception: org.postgresql.util.PSQLException: This connection has been closed.
| SQL Code: 0 | SQL State: 08003
***Edited by Moderator Marissa to update platform capability tags****
Thanks Brad, So Pega does not support DB Fail Over/Switch without having to restart Application Server? One of the requirements is to be able to support Fail Over and application should be recoverable once fail over happens successfully but I don't see that happening in our case unless i have missed out some configuration set-up etc.
To Brad's point, Pega does have some retry logic (which has issues, but that's a different story) but best practice calls for the application server to manage database connections, so if the application server is not testing / killing / replacing bad connections, Pega is stuck. The 'testOnBorrow' options will cause the connections to get reset. In PegaCloud, those options are enabled by default. I have both intentionally (RDS tuning) and unintentionally (an actual EC2 crash) had postgres restart and pega recovered.
That having been said, I don't think anyone ever expected that you would intentionally have Pega live for a protracted period with no database -- if you are trying to save money by powering down idle components, why not power down the application servers too ?
Thanks Werda, I had some success with testOnBorrow, i dont think we need to set the ValidationQuery, am i correct? As it will be performant impact if this query needs to be set. In Pega Cloud, has this been set? Moreover i assume this testOnBorrow will work for failover/switch of a database?