In our project, customer data is stored in a external cassandra DB.
We have a table with millions of records. We want to retrieve a select group of data by passing inputs to the partition keys of that table. And we want to do this in a data flow, so that it can do multi threading for performance. Below are options we tried, but nothing is working:
i)Using Data sets in data flow- Data sets can retrieve the complete data, but there is no option to specify any filters/keys so this option wont work
ii)Using Report definition in Data flow- When used stand alone, report definitions are able to retrieve data from external cassandra table with the keys. But it does not work with Data flows. Raised SR and got this confirmed.
iii)Using Data set execute in Activities- This will work, but the data retrieved is too huge to be stored in clipboard and looping over approx 300,000 and do any actions is not the best for performance.