We have designed a Queue Processor to do a large scale data migration from a CMIS IBM Filenet server to AWS S3. Although the process is working just fine, we have a performance issue as we need to move around 8 TB of data in Production. We have a staging environment which has close to 2 TB data, and the end to end process is taking over 20 hours, this won't be acceptable for Production. We looked at the various performance enhancement suggestions on Queue Processor like here : https://community.pega.com/knowledgebase/articles/system-administration/queue-processor-faq and https://community.pega.com/knowledgebase/articles/decision-management-overview/advanced-configurations-stream-service but the information seems to be conflicting each other and confusing, so here are our major questions:
1. Can we scale out to n no. of nodes to improve performance? right now we have 3 nodes where we are running the QP from, with 5 threads in each node. It seems the maximum partition size each QP can support is 20. If that's case, if i add 2 more nodes, then can i achieve ( 5 * 5) = 25 threads parallel performance or would it still do a max of 20 threads?
2. What is the difference between partition size and thread count? at one place there are used interchangeably in the articles above. Where at another place, it mentions partition size = no. of nodes * no. of thread per QP.
3. Can i increase the partition count beyond the default size of 20? So if i add more nodes should i have to keep increasing the partition size every time?
***Edited by Moderator Marissa to update Support Case Details***
We have a call scheduled with Pega to talk about the concerns of scaling and load balancing for Queue Processors on 09/23. Once this discussion has happened and we have received satisfactory explanations from Pega on this topic, i will close out this question.
Posted: 1 month ago
Posted: 11 May 2021 12:29 EDT
Ryan Taylor (rwtaylor)
BPM-Pega Solution Architect
I would suggest design change as well apart from optimizing infrastructure by adding dedicated kafka stream nodes , memory ,thread configuration
design change # try to optimize the design using dedicated Queues by using Queue-for-processing method instead of standard which will default msg entries in kafadata file system where you dont have dedicated partitions
After making this change your Queue Process no need to scan entire KAFKA-DATA in file system to identify the msg which need to queue; so that it will improve the performance
recently we had similar requirement. Default is 20 and you are planning to increase partition then it will apply to all. you surely don't want do partition more than 20 to Pega ootb processer.
I would prefer to follow pega support instruction on extend partition to your specific queue only and increase number of thread. Also check if commit can be done in group of 10-20. this is improve performance better.
I think the max number of concurrent thread is 20 throughout the cluster, so adding node might not be helpful. You can think of duplicating the queue processor. So, you can have 20 thread per queue processor.