Posted: 11 Dec 2019 18:32 EST Last activity: 27 Feb 2020 4:54 EST
controlling where a dataflow runs
I will prefix this question with the fact that we are still running PRPC 7.3.1 which only has a single DataFlow Service type, and I am aware when we finally upgrade to 8.3.1 we will have multiple DataFlow Services to configure.
I am well versed in the concept of a batch DataFlow, whether that be just for data manipulation with a DataSet in and a DataSet out, or when Marketing generates a DataFlow with a segment DataSet in, and runs a strategy with multiple output options. These will be load-balanced across the defined set of DataFlow Services (provided you use a PartitionKey in the inbound DataSet), otherwise they are single threaded and run on one of the DataFlow Service nodes.
but I am seeking clarification on how DataFlows run for either a Real-Time Decisioning Service Call, or from an Event (ESM) stream trigger.
If you are not using a Real-Time Container for the DSM Service call, you create your own Service end-point, and the Activity will use the "Call ExecuteSingleCaseDF" method to invoke the DataFlow, not the DataFlow-Execute method, although I am not fully aware of the complete differences here.
similarly when an Event arrives via an ESM stream the Dataflow is used to invoke an Event Strategy.
in both these cases the DataFlow is not run in batch mode and has a abstract as the start of the flow.
now in these two cases what determine where the DataFlow will run? (remember 7.3.1)
my understanding was/is that it will be assigned to a node with a DataFlow service, and if there is a DataFlow service on the local node that is where it will be run. is this correct ?
now when we get to 8.3.1 I know that we now have a host of defined DataFlow Services:
I would guess that the defined batch DataFlow type replaces what we now have in 7.3.1 as a dataFlow service, but what determines how the other DataFlow service types are used? is this new properties on Data-Decision-DDF-RunOptions, new method calls for an activity or ??
I ask this because I would like to ensure that the Cluster Architecture is built correctly to ensure we can separate the processing load, so that outbound marketing campaign DataFlows do not impact inbound RealTime DataFlow processing, and further to know if the DataFlow processing for inbound Events can be separated as well.
I have looked for documents on PDN and Mesh, and scanned the help Files, but I have not found a definitive answer yet. Is there an internal Pega Document which discusses how to configure this properly?