Yes so im a bit confused on what should guide our decision to change this setting. Meaning what is the expected grouping structure going to be. Its not clear to me how setting a value lets say .5 what should i base this decision on and what will PRPC do in the background to group the predictors that i have listed based on this setting.
Those settings are usually best left alone - the defaults are fine for most (if not all) situations. These settings allow an analytics expert to tweak the models in case the models don't behave as expected, but this would be very rare in practice.
To give a bit of a background: every predictor is grouped in statistically significant groups. For example, assume a predictor 'age'. This predictor might (for example) be grouped in groups (intervals) 0-20 years, 20-45 years and 45-80 years. Each group will have a statistically significant behavior. In practice, that might mean that persons in the 0-20 years group might be very unlikely to accept a certain offer, the 20-45 group might be very likely, while the 45-80 group has a probability to accept somewhere in between.
The 'grouping granularity' controls the threshold for what to consider statistically significant. Using a more granular settings could result in more groups, e.g 0-12, 12-16, 16-24, 24-38, 38-65 and 65-80. I.e. 6 groups instead of 3. More groups might give better predictions, but it also increases the risk of 'overfitting' - if the groups become too granular, they are becoming less predictive.
For example, assume there is a group for only 23-year olds, and only a single 23 year old customer was observed who accepted the offer. The model would then assume that *all* 23 year olds would accept the offer. I.e. the model no longer generalizes. The default settings usually give the best balance between predictive power and 'robustness' (resilience against over-fitting).
The 'minimum number of cases' control the minimum number of cases (fraction) that should end up in each group. I.e. a value of 0.05 indicates that each group should contain at least 5% of all cases - this implies that there will be at most 20 groups.