Posted: 6 Nov 2017 7:33 EST Last activity: 17 Nov 2017 2:42 EST
Use Kafka data set as input to Predictive model
For R&D purpose, we have created a predictive model with some sample telecom customer data (input - CSV). Now, we are trying to change the input to data set and get the data from Kafka. I've created Kafka data instance and data set. When Kafka data set is referred in Data selection step, I'm seeing the following error "Unable to load data from dataset com.pega.dsm.dnode.impl.dataset.kafka.KafkaBrowseOperation cannot be cast to com.pega.dsm.dnode.api.dataset.operation.BrowseAllRecordsOperation". This is when I pushed the same csv file to the topic I've created. But on the consumer terminal, I'm seeing the csv content.
Can you please suggest on how to achieve the data (with kafka data set) we had when used CSV as input. PFA snapshot of previous data selection page.
You should be able to use a Kafka (or any) dataset to score your models, i.e. use it as the source in a dataflow that then runs the model that's in a strategy.
Your screenshot shows that you're trying to use a stream (Kafka) dataset as the source in Predictive Analytics Director. That is not a supported scenario. PAD is designed to work with static (finite) data.
To build a PAD model on data coming in through Kafka, you could run a DF that takes this as a source, puts it in a DDS dataset (and run the data flow for a certain amount of time), then source PAD from this DDS data set.
I do agree that that PAD should give a clearer indication of the issue, or even not allow you do select unsupported datasets.