Posted: 7 Jan 2018 23:38 EST Last activity: 14 Jan 2019 10:59 EST
Data Flows - Partition Key
I am trying to understand how partitioning key works in data set. This is basically in the context of 7.2.1
My data set is mapped to a database table.
First question is , can the column defined as partition key have Null values ?
Is there any suggestion regarding the best number of distinct values?
Note on the form:
Define a property to distribute read operation across different nodes. To achieve best performance, make sure to use a partitioning key property with not too many and not too few distinct values.
Partition key is really only of use in a data set when it is being used in a Data Flow that is processing batch reads. The Partition Key allows data flows batch processing to be distributed vertically and horizontally within is multi-threaded / multi-node system. In such environments, you will typically have several threads (typically about 5) processing work on any given node to allow for vertical scaling of the work. If the batch job is to run over multiple nodes for horizontal scaling (pretty typical for large batch jobs), then you will have that many nodes trying to help process the work. Lets say you have 5 nodes, this would mean there is the possibility of 5-nodes x 5-threads to concurrently work on processing your job. Each of these should be given a slice of the work and the right way to slice it up is to give the system a key on which to do so. This is where the partition key plays a role. Best practice starting points for distinct partition numbers is about 100, but each systems volumes and deployment topology and use case might warrant examination of what that partition key number should be.
No. Cassandra does not allow null key values for partition keys. If you want to cluster these together (for whatever reason) you'll need to put a non-null value in like empty string or some special value.
Thank you for posting your query on PSC. This looks like an inactive post and hence, we suggest you create a new post for your query.
Thank you for posting your query on PSC. This looks like an inactive post and hence, we suggest you create a new post for your query. Click on the Write a Post button that’s at the top of this screen and also on our Pega Support Community homepage. Once created, please reply back here with the URL of the new post.
We have also sent you a private message opening up a communication channel in case you have any further questions.