Support Center

Question

PARAVAMU

Member since 2012

4 posts

American Express Technologies

Posted: Aug 16, 2017

Last activity: Oct 16, 2018

Posted: 16 Aug 2017 10:51 EDT
Last activity: 16 Oct 2018 12:03 EDT

Closed

Solved

Hazelcast / Pega and (Too many open files error) - Few clarifications ...

Report

Config : Pega 7.1.7 , RHEL , WAS 7.x , ORACLE

Issue - Our QA environment reporting Too Many open files ( Auto restart of JVM's)

General recommendations found : Increase ULIMIT value for open files to 65000

Observation : Even after ulimit increase we noticed that open files was increasing (lsof command) .

When checked it was found that hazelcast was having 19 members in the cluster but QA environment had only 8 members altogether.

Other members were from an environment we created long time back with a DB copy of our QA environment. Status nodes table had old dates but had entries. Even after removing the entry from Status node we noticed it was still 19 members.

Action taken : We cleared status nodes table on both environments and DB cache we were able to service restore .

Question :

1) Where is Pega storing the hazelcast cluster information ?

2) How hazelcast identifies the members for its cluster during startup - is there a pega linkage here or it is purely hazelcast framework ?

3) In case of clearing the cluster members for whatever reason - Any other steps we can follow instead of clearing DB Cache tables/ Status nodes ?

4) Is hazelcast continued feature in all latest versions of Pega ?

Thank you,

Prem.

Config : Pega 7.1.7 , RHEL , WAS 7.x , ORACLE

Issue - Our QA environment reporting Too Many open files ( Auto restart of JVM's)

General recommendations found : Increase ULIMIT value for open files to 65000

Observation : Even after ulimit increase we noticed that open files was increasing (lsof command) .

When checked it was found that hazelcast was having 19 members in the cluster but QA environment had only 8 members altogether.

Action taken : We cleared status nodes table on both environments and DB cache we were able to service restore .

Question :

1) Where is Pega storing the hazelcast cluster information ?

2) How hazelcast identifies the members for its cluster during startup - is there a pega linkage here or it is purely hazelcast framework ?

3) In case of clearing the cluster members for whatever reason - Any other steps we can follow instead of clearing DB Cache tables/ Status nodes ?

4) Is hazelcast continued feature in all latest versions of Pega ?

Thank you,

Prem.

**Moderation Team has archived post**

This post has been archived for educational purposes. Contents and links will no longer be updated. If you have the same/similar question, please write a new post.

Show Less

To see attachments, please log in.

System Administration

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 6 years ago

Posted: 23 Aug 2017 11:15 EDT

PARAVAMU

American Express Technologies

replied to JOHNPW_GCS

Report

Action taken : We cleared status nodes table on both environments and DB cache we were able to service restore .

View reply inline

To see attachments, please log in.

Posted: 6 years ago

Posted: 17 Aug 2017 6:42 EDT

BaigHabeeb

Virtusa IT Consulting

replied to PARAVAMU

Report

1) Where is Pega storing the hazelcast cluster information ?

-pr_sys_statusnodes DB table in pega rulebase.

3) In case of clearing the cluster members for whatever reason - Any other steps we can follow instead of clearing DB Cache tables/ Status nodes ?

-Not sure why you want to clear cluster members, but I think you mean clearing it from 'pr_sys_statusnodes DB table, this remedy is mostly specific for case when engine is failing on startup.

4) Is hazelcast continued feature in all latest versions of Pega ?

- I believe Yes it started since 7.1.7.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 18 Aug 2017 17:23 EDT

PARAVAMU

American Express Technologies

replied to BaigHabeeb

Report

Thank you for your time and response .

System A - Points to it's own Status nodes table in Pega Data schema (Split schema)

System B - Points to its own status node table

In our case System A was listing System B as cluster members.

Hazelcast started from 7.1.7 ..is in 7.2 / 7.3 or that will be sunset in Pega.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 17 Aug 2017 10:50 EDT

YSudhakarReddy

JPMorgan Chase & Company

replied to PARAVAMU

Report

Hi,

Hazelcast identifies the members for its cluster during startup based on the pr_sys_statusnodes table.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 23 Aug 2017 11:20 EDT

PARAVAMU

American Express Technologies

replied to YSudhakarReddy

Report

Thank you for your time and response.

In my case -

We deleted the Status nodes table entry but still during JVM startup we noticed it was looking for an IP that is not in Status nodes .. But, DB Cache Table clear helped resolve the issue. Need confirmation if it is stored in DB cache , Which table etc .?

I learnt during investigation that we can make Hazelcast work pega way or hazelcast way . Pega way is using the entries in Status node etc.. Hazelcast way is updating the below in PRCONFIG - which overrides looking at status nodes table and uses the below ...

env name="cluster/hazelcast/ports" value="5701-5750" />
<env name="cluster/hazelcast/interface" value="xx.xx.xx.xxx"/>
<env name="cluster/hazelcast/members" value="xx.xx.xx.xxx"/>

Thank you,

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 17 Aug 2017 11:00 EDT

JOHNPW_GCS replied to PARAVAMU

Report

Can you clarify something here: Hazelcast provides an in-memory cluster ('data grid') ; so although it might create files (and network sockets may count as 'files' in Linux system I think); I'm not sure it is known be a 'file-hungry' application ?

Did your 'lsof' report indicate any more detail about which types of files were talking about here ?

Did you also do something like:


cd /proc/<pid-of-prpc-jvm>/fd
ls -l

This should give a list of open files (and network sockets and other 'pseudo' files I believe).

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 18 Aug 2017 17:10 EDT

PARAVAMU

American Express Technologies

replied to JOHNPW_GCS

Report

Thank you for your time and response , Yes we traced it to hazelcast as lsof output had all related socket info.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 21 Aug 2017 4:30 EDT

JOHNPW_GCS replied to PARAVAMU

Report

Hi Paravamu,

Did you find that Hazelcast was opening 100s, 1000s, or 10ks of sockets ? I'm just wondering why the ulimit needed to be set so high (65k).

Thanks again,

John

To see attachments, please log in.

Like (0)

Accepted Solution

Posted: 6 years ago

Posted: 23 Aug 2017 11:15 EDT

PARAVAMU

American Express Technologies

replied to JOHNPW_GCS

Report

Action taken : We cleared status nodes table on both environments and DB cache we were able to service restore .

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 26 Mar 2018 14:21 EDT

KennethK8377

American Express

replied to JOHNPW_GCS

Report

John,

We experienced similar issue in our 7.1.7 environment where thousands of threads were being spawned by the application. The threads were trying to reach out and communicate with decommissioned cluster members using hazelcast 3.2, even though the servers were shutdown. The threads never closed properly, they just kept growing and growing. Apparently this is a known issue with the version of hazelcast we were using. I suspect that clearing the cluster nodes database table would have fixed our issue, as the cluster members would have repopulated with only the remaining servers, forgetting about the ones that were decommissioned. The only way to come to this discovery was to perform network captures, showing endless connection attempts from healthy nodes to nodes that had been decommissioned. Unless the connection attempt was met with a RST packet (in case where server was up and running but not listening on the hazelcast port) the threads would not close properly.

To see attachments, please log in.

Like (0)

Get Started with Community

Question

Hazelcast / Pega and (Too many open files error) - Few clarifications ...

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Hazelcast / Pega and (Too many open files error) - Few clarifications ...

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.