We are using 8.1.7 Pega marketing out of the box simulation process for huge customer data(say around 10M); Requirement is to run simulation for all customers all NBAs to check count of customer for each NBA. The simulation process is taking too long to complete and we can see huge amount of data is generated in output database. All OOTB & customized reports are also failing due to timeout, probably caused by the huge data.
Is there any suggestion for better performance and to run the reports.
So these are straight forward distribution tests? I would suggest that running on 10 million customers seems a lot for that. Have you thought about building a dataset with random % and running that? I can help with changing the report settings if you like so they open.
This is probably a great opportunity to prove the customer wrong :-) Repeat the test on a much smaller number and do a comparison. I imagine extrapolating the numbers of the smaller random number will be very close to % from the huge run.
There are some other things to consider with your simulation. I take it you are not running it in production so in terms of the data you are running on, is it a replica of prod? If not then perhaps your numbers from the big run will not be that accurate in the first place. Remember we encourage the use of simulations on as production like data as possible (including contextual, adaptive models and interaction history).
For the reports, find the report definition that is used for your report. Go to data access tab. Far right under general data access setting review the numbers. Default time to wait is 30 seconds (try a higher number) and review the settings to see what you might need to review the data.
Thanks for the detailed reply. Please find the response below-
1. This smaller random % of customer selection, how accurately this sample represents entire population. Is there any empirical evidence? We want to see how many approx. number of customers a certain proposition will reach. Say we go with this sample population method and extrapolate the number from the output- what can be possible deviation between extrapolated results and actual? Do you have any data to suggest it’s accuracy?
2. We are running simulation for production like data in replica of prod.
3. I could run one of the reports when Default time to wait is made around 300 seconds. Thanks!
Great news, this is a problem with report definitions sadly...
Yes I would suggest collecting your own evidence to get buy in on the sample size. I have rarely found need to go above 20% as long as the data is sampled correctly, this is particularly true for distribution.