In Pega Marketing, on top of the Customer data set, we have a lot of data associated with the Customer such as Discount Data, Usage Data etc. Since we are using this data in our campaign strategies, we add it to the CustomerData data flow (using Compose shapes). Our worry is that this flow is getting bigger and bigger, and is starting to affect performance. So is there a way better way of utilizing associated data sets?
I was also wondering if perhaps we can limit the data sets composed based on what the campaign strategy needs. However, my understanding is that the CustomerData flow will be executed first so that the data will be available to the strategy. Can someone confirm this?
This is one of the things we have in mind for future release to conditionally or optionally bring data that is needed. For now you will have to just limit your optimization to the right indexes etc on the properties/columns used in those compose conditions (PK and FK relationships is what I mean)
Also please benchmark by just running the data flow for customer data with all these composes and analyze the results in the data flow component level statistics to get clues on which compose is not optimal. That would be helpful to optimize your individual compose as well.
Hi, thanks for the response. So for the latest version of Pega Marketing (which I believe is 8.1), all the the associated data still needs to be composed in the CustomerData flow?
Also, can you elaborate on the second paragraph? So to see which compose is not optimal, we should simply run the data flow? I know that it will show the % of time taken for each component, but the size of the data set will also be a factor right? More time taken does not necessarily mean the compose is suboptimal.
Currently, our practice is to create an index on the properties used in the compose. Is there anything more we can do besides this?
we face a similar issue. we run different campaigns and some campaigns do not need all of the customerdata data flow (composes). It is possible to run different campaigns with different versions of the customerdata data flow . e.g campaign 1 with very minimal customer data info and campaign 2 with all the customer data in data flow. Now campaign1 is unnecessarily pulling all the customer info that is not needed for this campaign. Appreciate feedback.