Rule Index File Unavailable Or Corrupted in Landing Page
Our environment is running in multi-node server. some servers are sharing the same Pega Temp folder and index folders. For some reason, the search box doesn't work unless we type an "old:" prefix.
I'm going to follow the article on PDN below to fix the issue. but the resolution step 7 & 8 are confused me. Why need to cancel the started indexing and do it in step 8? after step 5, do we have to wait the primary search indexing completed then start secondary node? and how to define the Primary search indexing node and secondary?
First, you should not have more than one Pega node sharing temp and index folders. This can cause issues, so I recommend that be changed.
I think this SA may need some editing, but let's see how this works for you before we do that:
Before performing step 1, what I would suggest is removing all nodes from the Search Index Host Node Settings and disable indexing. So now when the nodes start, they will not begin indexing.
Step 5 and 6 just talk about starting the nodes themselves. You do not need to wait for the indexing to complete (it shouldn't start if you've disabled it as I mention above).
Step 7, if you've disabled indexing as I mention, you can skip the part about canceling the started indexing as it shouldn't start.
Replace step 8 with the following:
8. Enter the primary (first) Search Index Host Node ID and the Search Index File Directory, enable indexing and click submit. After that, begin the rebuilding of the Rules, Data & Work indexes by clicking "Re-Index". Do not define additional host nodes until the index files have been completely built on the first node.
Lastly, the first Host Node listed under Search Index Host Node Settings will be the 'primary' indexing node. I think that all this means is that when other nodes are added, their indexes will be replicated based on the primary node's indexes.
I'm going to follow the article on PDN below to fix the issue. but the resolution step 7 & 8 are confused me. Why need to cancel the started indexing and do it in step 8?
The search landing page will be read only if a node is shutdown while indexing was still in progress. Thus step 7 asks you to go to the search landing page and cancel the indexing before re-indexing again.
Note that if there was no indexing in progress when the shutdown happened, this step is not required.
>>after step 5, do we have to wait the primary search indexing completed then start secondary node? and how to define the Primary search indexing node and secondary?
It would be a good idea to do so.
As Nicholas Loving suggested, no two nodes should share their Pega Temp or Index directories.
Also, point number 3 is referring to the wrong table in the PDN article you pasted. The table name should be pr_log_cluster.
First, there is no reference to which Pega7 Release this is referring to, so some steps may not be provided correctly above. There were changes from the time ElasticSearch was introduced up to and including 7.2GA to get it working properly and not causing startup issues with multi-node clusters.
The release that this article was written for was for Pega 7.1.9GA. Assuming this is the same, you first need to setup all the nodes correctly with their own temp directories. Note that it is not required to have all nodes to have their own index folder unless you are setting up a High Availability environment and you always need the indexes available for other nodes when the primary indexing node is shut down or restarted for maintenance.
Having an environment where more than one node is pointing to the same explictTempDir folder and the explictIndexDir folder, then your environment is not setup properly and needs to be fixed before addressing the indexing location.
explicitTempDir value set in the prconfig.xml file - update the file and change the <env name="initialization/explicitTempDir" value="E:\Node1Temp" /> value to a non-shared folder location.
explictTempDir value set in the Data-Admin-System-Settings - update the value for Pega-Engine prconfig/initialization/explicittempdir/default instance. Note that this setting is GLOBAL (all nodes) unless the prconfig.xml is updated to override internal settings such as this. Or there is separate instance for each node defined in the Data-Admin-System-Settings.
If the System Settings Search Landing Page indicates a current index is in progress, Cancel that index request. If there are more than one row showing for index file locations, remove all and wait for confirmation that the node has been removed. This itself can take 5-15 minutes so wait for the email confirmation assuming you added your email address in the Automated Search Alerts and all 3 boxes have been checked.
Shut down ALL nodes - not just the ones pointing to the wrong folder. Your cluster was not configured correctly and you need to get it back in line where it should be.
Delete the files in the current explictTempDir and the explicitIndexDir to make sure each node startups and has its own folders to work with.
Decide which node will be the "Primary Indexing" node and start that node first. You MUST wait for the node to complete the startup process before starting any other nodes. Log into the node and open the Clipboard and copy (double click) the pxProcess.pxSystemNodeID value to an editor or just your local clipboard (ctrl + c). You can verify in SMA for this node id to make sure they are the same.
Open the System Settings Search Landing Page and expand the "Search index Host Node Setting" section. This will contain the node id you just copied and the location of the search index files. Note: it is okay to use the same "named" folder name for multiple nodes ONLY if the nodes are separate virtual, logical or physical servers. The naming convention makes it easier to know where it is no matter what server it lives on for consistency. If there is nothing listed here, paste the node id value into the row for Search Index Host Node ID box and then type in the name of the folder where you are indexing the files for this node. Depending on how large your database data is for PRPC and speed of servers, this can take a couple hours to fully index Rules, Data & Work. You should wait until all indexes complete before you add any other nodes to index.
As for the pr_log_cluster versus pr_sys_statusnodes, they both contain information about indexing. The later indicates which node is CURRENTLY indexing which you need to make sure is correct. The pr_log_cluster indicates all the "requests" sent to the server to index and the pyStatusWork will show if it completed (Resolved-Completed) or (Resolved-Cancelled) for example. For the purpose of this document and the other, we are referring to pr_sys_statusnodes as we want to make sure the correct node is being indexed and running.
This discussion spans multiple topics (environment setup, high availability, indexing, temp files, system startup sequences and multi-node clusters) and areas of the system, so it may be best to create a support request to help get the environment working properly if the above steps do not help.