Question
Steps to use partitioning in pega marketing
Our client's current pega marketing 7.12 on base pega 7.1.7 application has performance issues, as in to run a campaign and send offer SMS to a target customer population of 1.3 million it takes 7-8 hours may be more, no partitioning was implemented then.
So post upgarde to pega marketing 7.22 on base pega 7.3 we decided to use partition so to ensure parallel processing, we followed the below steps
1. We used database partitioning (underlying DB is oracle) for Customer table, Installed base table and defined 10 partitions in application settings, considering we will be having 4 jvms on two physical boxes hence 2*4 =8, hence kept 10 partitions, partition key ranging from 0-9 for both customer and Installed base.
2. Resaved all our existing segments, used partition keys in all data sets.
3. Added all the nodes in Data Flow service.
4. Also for Interaction history partitions we followed the instructions in the link https://pdn.pega.com/interaction-history-partitions/interaction-history-partitions and defined around 16 partitions, partition key ranging from 0-15.
I have following questions
1. Are there any other steps we need to follow to implement for partitioning?
2. Should the partition range be kept same i.e 0-9 for Interaction history? I mean keeping different partition key range for Customer and Intercation history will it create any problem?
3. How to test if partitioning is working in development environment? In dev env we have two nodes, created a segment containing avg 100 customers belonging to different partitions, my expectation if I print customer id and partition keys in pega logs I will get half of the segment customer is will be printed in node 1 log and half in node 2 logs. But that's not how it's happening most of the time all the customer ids are printed in node 1 and some time few of them in node 2, some times I can see same customer ids are printed in both logs. Not sure how partitioning behaves, need some clarification, how to test the performance gains from it.
4. If partitioning works corrrectly as expected in production having 4 nodes, then can we expect 1/4 the time it takes currently to run a campaign for 1.3 million i.e 1/4 * 8 = 2 hours or less.
5. If tomorrow we increase number of nodes and want to increase partitioning range to 0-16, do we need to resave all segmemts(100+) again?
Regards
Abhi
Hi Abhi,
1. i have following comments on below point.
"1. DB partitioning" - the number of partitions seems less, and i think it is not just a factor of how many nodes you have but also how powerful your hardware is, in terms of cpu, memory and disk IO, so that you could run multiple threads per node.
The Dataflow execution splits the amount of work into multiple assignments and one of the factor is partition key, more number of partitions the better. however below is the formula that will limit how many assignents will get created, so you can achieve better parallelism if number of partitions are slightly higher than below.
nos of Assignments = number of nodes in data flow cluster * number of configured threads * 2
"2. Resaved all our existing segments, used partition keys in all data sets.". - When you change the partition mode, you need to recreate the segments from scratch (instead of resave). To validate if the segment is picking partition information, please check the output Segment table, they should now have 2 columns customerid and partition key.
2. i am not sure of any relation between partitions on Customer and IH fact table.
3. & 4. one method i can suggest to troubleshoot Campaign performance problems is to enable debug logging level on "ExecuteDDFInBatch" actitivy.
The campaign execution runs a dataflow in the background and when DF run completes successfully it deletes it.
Enabling debug on "ExecuteDDFInBatch" activity prevents this deletion.
Hence when the campaign completes, you can open the corresponding PR-xxx Dataflow run from Designer Studio -> Decisioning ->Decisions ->Dataflows ->Batch Processing landing page.
In the opened page, you will have 2 tabs in the DF execution page, "Run Details", and "Distribution details"
In the Run Details tab, you can view the component statistics and inspect the % of total time.
From this you can deduce which components are running slower and taking most % of the time from overall time.
in the distribution tab you can see how many records were processed on each node. In case of partitioned mode you should see multiple nodes here.
When running small customer segment, you wouldnt be covering records from all the partitions. You have to inspect the segment table check how many distinct partitions got picked using your segment criteria. you could also inspect the batchoutput table. Both segment and batch output table will have parition key column.
5. I could think of 2 possible functionality that could be affected.
a. Segments - you should rerun all the segments, no need to recreate, but segment table should be regenerated.
b. existing Campaign runs - you have to make sure your existing campaign run that are in wait state are completed as wait flow processing makes a join of batch output table with customer table.
Regards
Manu