Our client's current pega marketing 7.12 on base pega 7.1.7 application has performance issues, as in to run a campaign and send offer SMS to a target customer population of 1.3 million it takes 7-8 hours may be more, no partitioning was implemented then.
So post upgarde to pega marketing 7.22 on base pega 7.3 we decided to use partition so to ensure parallel processing, we followed the below steps
1. We used database partitioning (underlying DB is oracle) for Customer table, Installed base table and defined 10 partitions in application settings, considering we will be having 4 jvms on two physical boxes hence 2*4 =8, hence kept 10 partitions, partition key ranging from 0-9 for both customer and Installed base.
2. Resaved all our existing segments, used partition keys in all data sets.
1. Are there any other steps we need to follow to implement for partitioning?
2. Should the partition range be kept same i.e 0-9 for Interaction history? I mean keeping different partition key range for Customer and Intercation history will it create any problem?
3. How to test if partitioning is working in development environment? In dev env we have two nodes, created a segment containing avg 100 customers belonging to different partitions, my expectation if I print customer id and partition keys in pega logs I will get half of the segment customer is will be printed in node 1 log and half in node 2 logs. But that's not how it's happening most of the time all the customer ids are printed in node 1 and some time few of them in node 2, some times I can see same customer ids are printed in both logs. Not sure how partitioning behaves, need some clarification, how to test the performance gains from it.
4. If partitioning works corrrectly as expected in production having 4 nodes, then can we expect 1/4 the time it takes currently to run a campaign for 1.3 million i.e 1/4 * 8 = 2 hours or less.
5. If tomorrow we increase number of nodes and want to increase partitioning range to 0-16, do we need to resave all segmemts(100+) again?
"1. DB partitioning" - the number of partitions seems less, and i think it is not just a factor of how many nodes you have but also how powerful your hardware is, in terms of cpu, memory and disk IO, so that you could run multiple threads per node.
The Dataflow execution splits the amount of work into multiple assignments and one of the factor is partition key, more number of partitions the better. however below is the formula that will limit how many assignents will get created, so you can achieve better parallelism if number of partitions are slightly higher than below.
nos of Assignments = number of nodes in data flow cluster * number of configured threads * 2
"2. Resaved all our existing segments, used partition keys in all data sets.". - When you change the partition mode, you need to recreate the segments from scratch (instead of resave). To validate if the segment is picking partition information, please check the output Segment table, they should now have 2 columns customerid and partition key.
2. i am not sure of any relation between partitions on Customer and IH fact table.
3. & 4. one method i can suggest to troubleshoot Campaign performance problems is to enable debug logging level on "ExecuteDDFInBatch" actitivy.
The campaign execution runs a dataflow in the background and when DF run completes successfully it deletes it.
Enabling debug on "ExecuteDDFInBatch" activity prevents this deletion.
Hence when the campaign completes, you can open the corresponding PR-xxx Dataflow run from Designer Studio -> Decisioning ->Decisions ->Dataflows ->Batch Processing landing page.
In the opened page, you will have 2 tabs in the DF execution page, "Run Details", and "Distribution details"
In the Run Details tab, you can view the component statistics and inspect the % of total time.
From this you can deduce which components are running slower and taking most % of the time from overall time.
in the distribution tab you can see how many records were processed on each node. In case of partitioned mode you should see multiple nodes here.
When running small customer segment, you wouldnt be covering records from all the partitions. You have to inspect the segment table check how many distinct partitions got picked using your segment criteria. you could also inspect the batchoutput table. Both segment and batch output table will have parition key column.
5. I could think of 2 possible functionality that could be affected.
a. Segments - you should rerun all the segments, no need to recreate, but segment table should be regenerated.
b. existing Campaign runs - you have to make sure your existing campaign run that are in wait state are completed as wait flow processing makes a join of batch output table with customer table.
Thanks for your reply, please find my comments below
1. We will see if we can increase the partitions , as per your instruction based on our proposed hardware we should be having 4(no of nodes)* 5 (number of thread) *2 = 40 partitions or more. But after increasing the partitions do you think only executing the segment once more will do, because we saw sql query with partition column are embedded in segment rule blob which changes when we resave the segment, please confirm this.
2. Actually what we did was we deleted the segment and restored again with same name that did the trick for us.
3. The relationship between Interaction table partition and customer partition is very important, can you point us to some one who can answer that.
4. We will use technique you mentioned to check performance post partitioning in dev environment.
5. Also can you please confirm if segment execution also uses partitioning, because segment also takes lot of time, so I was wondering the DeferredSegment agent which I believe is responsible for segment execution uses data flow and partitioning.
we are also in the process of introducing partitioning in our CAR tables. As you mentioned that you did table partitioning, can you let me know what type of partitioning was done? like hash, range or just using a MOD function on the primary key and store the value in a new partition key column?
The application settings landing page asks for the range of the partitions, for us it would be around 64 based on the same formula you are using, how do we specify the range in the landing page?
Well if you are using DB partitioning then physical table has to have physical partition which is a DB feature. In our case the ETL process which populates the table is generating those partitions using MOD 25 on the last two digits our Customer ID which leads to 25 partitions i..e 00 -24, which is a reasonable number.
Then we specify this ranges in the application setting page.
Thanks for the response. In our case we need to do the partitioning on the existing customer data and then probably include the function in the ETL ( our case its also a ETL feed) for subsequent updates. What I wanted to know is whether you had a column in the table with the partition key values from 00-24 or were they mentioned as partition names in the table definition ( not visible as columns).