We are planning to use IH Summary Datasets in our application. Please suggest what option to use for IH Summary Datasets: Materialized or Not-Materialized. Also, I have the below questions:
1. When Materialized/Not-Materialized IH Summary Dataset is deployed in production, will they populate complete (historical/past) IH data or will read IH from the date of deployment?
2. We have some issues in Production with Cassandra nodes, they went down in past and may go down in the future also. In this case, can Materialized/Not-Materialized IH Summary Dataset populate with IH when Cassandra is down?
Allow me first to explain the difference between Materialized versus Un-Materialized.
Un-Materialized: basically means whenever you need to retrieve IH summaries aggregated values, we need to go to IH relational DB and aggregate the summaries from there. As you notice, this implies intensive read operations from the relational DB as well as slower than normal retrieval time for the IH summaries.
Materialized: As data comes in to IH, they will aggregated on the fly and then stored in Cassandra. This implies retrieving the IH summaries from Cassandra which means offloading the IH relational DB from intensive reads as well as faster retrieval time compared to the un-materialized case.
Given the above, I recommend to always use Materialization
1. When setting up your IH summaries you specify the start date of aggregation, it can be from the beginning of time, or it can be from specific date (default is from beginning of time). Please check this configuration.
2. Yes, this is the current behavior, when Cassandra is down we retrieve data from the IH relational DB.
I tried testing in our Dev and Test environments today. We are using Pega 8.4.3 version and Pega Marketing 8 version.
I have created some IH summary datasets (not specified the start date for aggregation) for aggregating data, by default datasets were Not-Materialized and I update some of them to Materialized. Later, we deployed these dataset rules from Dev environment to Test environment. After the deployment, we observed that all datasets were Not-Materialized. So, I set some of them to Materialized. One more observation is, both datasets were considering complete (historical/past) IH data for aggregation.
1. Why the Materialized datasets were got deployed as Non-Materialized. Are we missing anything here, and what is the expected behavior?
2. As I have not specified start date for aggregation, IH summary datasets considered complete IH for aggregation in Test environment, I think this is expected behavior, right?
Can you please provide answers for the above questions?