Question
About pr_sys_workIndexer_queue table & Lucene Search
We got the use case as to get rid of manual search by Lucene search.
I'm pretty much new to the "Search" functionality ,so just gone through the articles in PDN,and I've couple of questions regarding Lucene search and PR_SYS_WorkIndexer_Queue table .
PR_SYS_WorkIndexer :-
1) Why indexing file to be created in external/internal directory ,and how does it helps in searching for the objects.?
2) How entries are being added to the queue table?
Is it happened when work object is created? but I could see that it's happening when doing "ReIndex" operation as well.
3) why queue table is required for work index? , and Is "SystemWorkIndexer" activity creates a index files in directory? .
Lucene Search :
As per my understanding, single node should be the host for search indexing , and there is a web service call to get the indexing records when search for the work object from different node,
but how the system is consuming the internal web service for search.
Please make me to understand search functionality ..
Thanks in advance ,
Brahmesh.
Hi Brahmeswara,
Search and indexing are vast topics but if you are familiar with database technology, I will try to draw analogy between the two of them.
When you write an SQL query to retrieve data from a database, it works great if the number of records are small but as the number of records increase, the performance drops. To overcome this, the simplest suggestion given is to create an index on the column which is used in the WHERE clause of the SQL statement. Now indexes consume more space in the database, but speed up retrieval and thus your queries run faster.
So if databases already have this feature, why do I need full text search?
Lucene is one of the libraries that provides full text search. Since in the Pega platform, we don't expose each and every property as a column in the database, we can't write SQL statements which are performant when they have to refer to the values in the storage stream. Also, since the structure of the data stored in the storage stream is hierarchical, it is not easy for RDBMS to provide efficient retrieval using SQL. So full text search engines do inverted indices. You can read more about inverted indices and full text search at the Lucene website - http://lucene.apache.org
So how does Pega use Lucene?
The Pega platform takes the data stored in the stream and indexes the content so that the search control can retrieve the details of any instance where the search string was found anywhere in the document. We have a search landing page which provides the details of the indices that we have. You can re-index through the search landing page. Now full text search index will maintain the index on the file system. Since the file system is specific for each node, thus only one node maintains the index. With Elastic Search in Pega 7.1.7 onwards, we can provide failover as well.
Why do we need the pr_sys_workindexer_queue?
The data in the database is not static. It keeps changing as instances are created, updated and deleted. This means that the Lucene index files need to be also made up to date with these changes. Thus as instances are changed in the Pega platform, we make a note of the pzInsKey of the instance in the pr_sys_workindexer_queue table. Subsequently the SystemWorkIndexer agent picks up the entry, gets the latest changes, and modifies the index files.
What does search do?
When you search for a specific text, the Lucene index is looked up and records are returned that have this text in it. Since the index is hosted on one node, we use SOAP to connect to the search node if the current node initiating the search is not the search node. This is internal to the Pega platform and as a Pega developer using the platform to develop an application need not be worried.
Hope this answers your questions.
-Rajiv