non performing batch performance with SQL Server JDBC drivers
environment is PRPC 7.1.3 with Pega Marketing and SQL Server 2016, with JDBC driver sqljdbc42.jar running via WebSphere 8.5 on Linux (RHEL) virtualized servers.
we have noted a widely varying difference in performance when retrieving data from SQL Server via a DataFlow in Batch mode, whereby 96% of the time running the DataFlow is spent reading data from SQL Server (5 tables) against a data set of roughly 20 million rows, and providing unacceptable timing for the process.
the tables are indexed properly, and when you run a test with a batch size reduced to 40 or 50, the performance is very good with queries returning in 30-60 ms. however if you increase the batch size to 100 or more, the same queries now take in excess of 400,000 ms.
for a manually created dataFlow ( to read from SQL Server, build a class, and save the data to Cassandra DDS) we have full control over the performance as we can set the defaultBatchSize to a suitable value (say 50, which works fine).
but, when running Pega Outbound Marketing, the defaultBatchSize is not used (or so it appears), and I believe that it calculates its own value based on the partition size, and passes in the value of pyBatchSize to the DataFlow run. leaving us no control over the process.
for example: using a partition key with 100 values, if we send in a segment size of 5000 customers, with 100 partitions, you get 50 customers per partition, which is assigned to a DataFlow thread to process. we can observe in the log files (with com.pega.dsm.dnode set to DEBUG) that the resulting queries use a where clause with 50 or'd customer ID's. this performs fine, but if we increase the segment to 8000, (80 customers per partition) we can see the where clause contains 80 or'd customer ID's and the performance slows down in the neighborhood of 7-8,000 times slower.
we tried altering the defaultBatchSize and it is definitely ignored in Outbound Marketing.
when you increase the where clause to read a larger batch the JDBC server changes the access to something drastically more inefficient, it almost feels like the index is ignored and you are doing a table scan of the 20 Million Customers.
you can run the exact same query via SQL Server Management Studio, and it performs very quickly (as expected), the explain plan confirming that the index is being used properly. it is only the combination of JDBC and a large where clause that triggers the issue.
I have not seen the issue with Oracle or Postgres, it seems to be unique to SQL Server
we are looking at two fronts:
can we manually fix the batch size for an Outbound Marketing DataFlow run?
is anyone in the Community aware of JDBC configuration settings that affect this, so that we can address the problem at source?
What Marketing (I see you are running PM 7.31) dataflows are you describing here? The newer version of PM allows you set batchSize through application setting, I wonder if you can edit the dataflows in question to set the pyBatchSize there.
outbound Marketing generates its own dataflow run (as you know), which is what we were referring to.
since I posted this we have spent some time looking at the controlling activities for DataFlows, and have seen exactly what you are referring to in ExecuteDDFInBatch, where if you manually run a DF it uses the DSS value Pega-DecisionEngine • structureddata/defaultBatchSize, but if you run a Marketing DataFlow (outbound) it uses the Marketing Datapage, Declare_MKTSystemSetting.DecisionBatchSize to set the .pyBatchSize.
both of these Default to 250 ... and of course if we have a partition size smaller than 250, the utilized batch size is limited to the number of customers in the partition, up to the max of 250.
so based on this we have a work-around now ... but I would still like to understand what is wrong with SQL Server, when you run a query via JDBC with a whre clause containing 80 or more OR clauses in it. Because we know this works fine in the MSSM console, just not via JDBC.
we are setting the DSS value to 40 ( as we observed that we stress Cassandra with higher numbers, when writing to it) and 50 for the Marketing Datapage.
I will update this post later today, when we have run more comprehensive tests with larger segment sizes.