I have an agent scheduled at a particular node (let's label it node G), and it runs an activity that in turns executes a data flow. However, when I check from the data flow landing page, I see that the data flow ran in a different node (node C). This is a surprise to me since I was expecting the data flow to execute in the same node where the activity was triggered from. Is this expected behaviour from Pega 7.3.1? In particular, does the Dataflow-Execute method have to do its job in the same node where the activity that invoked it was ran on? Or can this method do its job in whatever node it chooses?
***Edited by Moderator Marissa to update platform capability tags****
This is expected behavior. Data flow batch runs (where the source is not abstract) - will always do a distributed run depending upon which nodes are designated to run data flows. In newer version this could also be controlled in more granular fashion with node type configurations.
It's interesting you mentioned that for data flows with non-abstract sources always do distributed run. Because our data flow always runs its processes all within a single node despite us having 8 data flow nodes available for processing. When you say distributed run, were you referring to the agent running the activity in any of the 8 nodes or the actual process of the data flow being distributed among our 8 nodes?
It also needs enough partitions in the source (please check if the data-set source has a partition key defined) - so depending upon the # of partitions in the source, the #of threads in the data flow settings landing page it should scale.