The client is exporting around 75k Records to PDF and now having issue with export to PDF. The parsing is taking around 2GB memory and we are seeing 100% utilization for a long period of time. even if we chunk the pdf with smaller records and then merge, we are seeing the same behavior.
I am not sure what could be possible cause for high CPU utilization?
Assuming you are running a report which is fetching 75,000 records, and using the 'export to pdf' functionality to generate a PDF (correct me if my assumption is wrong here):
PRPC will first create an-in memory HTML version of those 75,000 records and then applying the HTMLTOPDF ACTIVITY on it to generate the PDF; I would guess that this would be fairly memory-intensive for this many records.
How many columns does your report contain ? (So we can try and estimate the total number of 'cells' in your report).
Do you get any issues displaying 75,000 rows on-screen just prior to using 'export to pdf' ? Or is your report setup to display the first-page only ?
Can you also elaborate by what you did to achieve the 'chunking' you mention:
even if we chunk the pdf with smaller records and then merge, we are seeing the same behavior.
And lastly : you should check your ALERTs file - and see if you can spot any obvious bottlenecks in there (are their DB alerts in there ? Unexposed columns being fetched etc etc).
There is also another thing that *might* be coming into play here:
Are there any (super) large ALERT lines in there ? I don't mean the 'kpiValue' - I mean the number of characters that actually make up the alert text.
We have seen in the past that because of (custom?) configuration of the ALERT file format; the entire parameterpage can be dumped in text form in the ALERTs; and (if I remember correctly) generating PDFs/XLS files from the 'export to <xxx>' function may create a very large (basically the whole report text) on the parameter page; so you can get a vicious-circle occuring in that scenario (that is: the system will log an alert; the alert text itself takes a while to write-out; which causes further performance problems, making it more likely other alerts are logged etc...)
Thanks Ritesh: do you have an estimate of how big the resultant HTML version of the full 75,000 row (how many columns?) report is in bytes ?
It could be simply a case that [ 75,000 x Number-Columns ] cells in the report is sitting sufficiently large in-memory to cause adverse performance issues.
One other thing: any 'controls' on any of the columns ? These could be using up a lot of cycles if so - if the controls are looking up formatting information (which may not be being used in the end PDF).
And (I have to ask): do the business users really want a report that consists of seventy five thousand rows of data ? Wouldn't they die of boredom after the first few pages ? ;-)
[I'm only half-joking here in fact - is there is a genuine use-case for this?]
In any case: I'm speculating here - you should start with the Alerts and make sure we understand the nature of performance bottleneck here. (And check for that very-large-alert-log-lines issue I mentioned above).
Business shouldn't use Pega as BI tools or using Pega reports to replace their blue screen mainframe where they can see large amount of data. This application was designed and developed by the partner and now before they are going in Prod with this, they started seeing the issue.
I am tasked to help them so they can move to Prod but this is really interesting problem.
OK - to summarize my thoughts on this; I think the following tasks are the way to get going on this:
1. Calculate an estimate for how big the HTML version of the pre-converted PDF is in bytes.
So: we know the number of rows is 'r', we need to find out the number of colums 'c'.
Then for each column ; can we work out the maximum (and I guess minimum, median etc) size of data (in bytes if possible - so take into account the character set in-use) for that specific column.
Then multiply that out; to get an estimate of the data for a report where all the columns contain the maximum amount of data.
You will also need to add in a few extra bytes for each data 'cell' ; probably (I would have to check) something like a '<tr><td>...</td><tr>' sequence constant for each cell.
Then figure out if any columns relate to PRPC Properties which have special controls - and if they do; they might add time (fetching/rendering) and bytes (depending on the output) to the final report.
Also: identify any PRPC properties which are 'Declarative Targets': ( I seem to remember that the 'Inference Engine' is actually disabled when reports are generated to increase performance; but I'm not sure if this an option you have to enable or not); if there are Declaratives in the report - they will processing time to the report-generation (I'm not sure of the effect of memory usage though).
2. Check the Alerts.
Use the Alerts to identify actual bottlenecks encountered in running the reports: this can help with the theories above about declaratives/controls etc.
3. Create a DEBUG version of the HTMLTOPDF conversion ?
Instead of (or as well as) estimating the size of the input HTML given to HTMLTOPDF: you could actually check the size of it; I think the HTML is 'handed' over as a single Parameter String to HTMLTOPDF; so you can use standard Java Tools to see how big it is just prior to converting it.
You could even write out the HTML to a separate file; and take a look at it; you should definately sanity-check it based on the estimates made from number of rows/columns/max-size-of-data-per-column.
You could also try running reports which are of varying number of rows: to see how the performance of the system changes (checking the alerts again? based on the input - run (say) 10 rows, 100 rows, 1000 rows, 10,000 rows.
And see what sizes of HTML you get; and measure how long it takes to generate a PDF in each case.
Additionally: do you see the same sort of thing when you convert to EXCEL rather than PDF ?