More details on Elastic Search in multi node environment when the primary search node is shutdown/crashes
I am trying to find out more information about how Pega (7.1.8) and ElasticSearch work together when the primary search node (for ElasticSearch) in a multi-node environment (with a load balancer in front) goes down for one of the following scenarios:
1. it's taken offline during a migration of code, hotfix, pega upgrade, OS patches, etc.. planned offline/outage (say 1 hour at most)
2. it unexpected crashes overnight and is unavailable for hours
So, with the following scenarios above:
1. How does the searching supposed to work on node 2+ when this occurs; is this just adding the multiple nodes to the search list?
2. Does the environment re-elect a primary search node ? or does it wait until the primary comes back online? What happens if a new node has been built to replace the node that crashed (and hence, a new node ID ) which would mean the original primary search node will never be re-added back to the node list..
3. My fear is that there is manual intervention required (in particular item 2). Please ensure me there's no manual intervention required!
***Updated by moderator: Lochan to add Categories***
**Moderation Team has archived post**
This post has been archived for educational purposes. Contents and links will no longer be updated. If you have the same/similar question, please write a new post.
How does the searching supposed to work on node 2+ when this occurs; is this just adding the multiple nodes to the search list?
If you add multiple nodes to the index host node list on the search landing page, each of those nodes is master eligible and therefore can handle search and indexing requests. If two nodes are configured and one goes down, the other can take care of the search requests.
Does the environment re-elect a primary search node ? or does it wait until the primary comes back online? What happens if a new node has been built to replace the node that crashed (and hence, a new node ID ) which would mean the original primary search node will never be re-added back to the node list..
It does not wait for the node to come back online if multiple nodes are configured on the search landing page and other nodes are available, to service search and indexing requests. If you know that the original node will never be added back, then the search landing page must be configured to remove the unavailable node and the new node must be added which will replace it
My fear is that there is manual intervention required (in particular item 2). Please ensure me there's no manual intervention required!
As of now manual intervention is required. We are working on making this more automated in the future releases for the Pega platform.
We have a similar situation in production after we upgraded from 7.1.9 to 7.2.2 where we are having single index node but the index file got corrupted due to unknown reason after the upgrade. We are having a multi node environment with 5 VM's and 4 instances on each server but few of these online instances started getting hung 5-10 minutes after the restart. GCS found that the instance hanging issue is caused by the corrupted index file on index node and the instances are getting hung when it tries to contact the index node for connection or search. As suggested by GCS, we updated the DSS parameters to disable index cross node references and shut down the Index node. We have very huge WO volume and the index file is 52 GB in size. Now business badly needs the Elastic search feature back.
We are planning to use a new node for Rebuilding the index. Please advise what precautionary measures we need to take to avoid any impact in production before and after Index Rebuild.
Is there an option to Rebuild the index on a new node without any impact to other online nodes and later enable the search?
Is there an option to Rebuild index for WO's only from a specific date say from 01-May-2017 to save time as business normally searches for cases created in the last couple of months.
I have a question, once we configured the multiple nodes in the search landing page. Then how the below DSS values will behave "indexing/hostid" and "indexing/hostname" and Index management in SMA, as there are multiple Index nodes?
If the DSS is ignored, then on what criteria the requests will be segregated to the Multiple Index nodes from the Non-Index nodes?