Hazelcast / Pega and (Too many open files error) - Few clarifications ...
Config : Pega 7.1.7 , RHEL , WAS 7.x , ORACLE
Issue - Our QA environment reporting Too Many open files ( Auto restart of JVM's)
General recommendations found : Increase ULIMIT value for open files to 65000
Observation : Even after ulimit increase we noticed that open files was increasing (lsof command) .
When checked it was found that hazelcast was having 19 members in the cluster but QA environment had only 8 members altogether.
Other members were from an environment we created long time back with a DB copy of our QA environment. Status nodes table had old dates but had entries. Even after removing the entry from Status node we noticed it was still 19 members.
Action taken : We cleared status nodes table on both environments and DB cache we were able to service restore .
1) Where is Pega storing the hazelcast cluster information ?
2) How hazelcast identifies the members for its cluster during startup - is there a pega linkage here or it is purely hazelcast framework ?
3) In case of clearing the cluster members for whatever reason - Any other steps we can follow instead of clearing DB Cache tables/ Status nodes ?
4) Is hazelcast continued feature in all latest versions of Pega ?
**Moderation Team has archived post**
This post has been archived for educational purposes. Contents and links will no longer be updated. If you have the same/similar question, please write a new post.
We deleted the Status nodes table entry but still during JVM startup we noticed it was looking for an IP that is not in Status nodes .. But, DB Cache Table clear helped resolve the issue. Need confirmation if it is stored in DB cache , Which table etc .?
I learnt during investigation that we can make Hazelcast work pega way or hazelcast way . Pega way is using the entries in Status node etc.. Hazelcast way is updating the below in PRCONFIG - which overrides looking at status nodes table and uses the below ...
Can you clarify something here: Hazelcast provides an in-memory cluster ('data grid') ; so although it might create files (and network sockets may count as 'files' in Linux system I think); I'm not sure it is known be a 'file-hungry' application ?
Did your 'lsof' report indicate any more detail about which types of files were talking about here ?
Did you also do something like:
This should give a list of open files (and network sockets and other 'pseudo' files I believe).
We experienced similar issue in our 7.1.7 environment where thousands of threads were being spawned by the application. The threads were trying to reach out and communicate with decommissioned cluster members using hazelcast 3.2, even though the servers were shutdown. The threads never closed properly, they just kept growing and growing. Apparently this is a known issue with the version of hazelcast we were using. I suspect that clearing the cluster nodes database table would have fixed our issue, as the cluster members would have repopulated with only the remaining servers, forgetting about the ones that were decommissioned. The only way to come to this discovery was to perform network captures, showing endless connection attempts from healthy nodes to nodes that had been decommissioned. Unless the connection attempt was met with a RST packet (in case where server was up and running but not listening on the hazelcast port) the threads would not close properly.