Implementation of the Apache Kafka® with the Apache AVRO serialization in Pega
The attached document is the lesson learned for a future reuse with the implementation details of the Pega Platform integration with the Apache Kafka streaming platform using the Apache AVRO serialization.
Demonstrate the implementation of the Apache Kafka event streaming platform with the Apache AVRO serialization to consume the Apache AVRO messages using the Pega Real-Time Data Flow run.
The Apache Kafka® is a distributed streaming platform, which has three key capabilities:
publish and subscribe to streams of records, like a message queue or enterprise messaging system
store streams of records in a fault-tolerant durable way
process streams of records, as they occur
Kafka is generally used for two broad classes of applications:
building the real-time streaming data pipelines that reliably get data between systems
building the real-time streaming applications that transform or react to the streams of data
The Pega Exchange component is not available officially yet. The described component was delivered and tested on Pega 8.2.x and sanity checked on Pega 8.3. I have asked the product team, when there is a plan to release it as the official Pega Exchange component and if the current version of it can be shared here, so I will update you, as soon as I know that.
I did not test the writing into the topic, but I think that it should work too, as reading works. To use AVRO serialization in Pega you need to define classes in Pega with properties, which represent the AVRO message. Having that, when you create the Kafka Data Set in the class, which is the root of the message, you should be able to read and write from and into AVRO messages. I assume to write, you need to prepare the correct messages (matching the AVRO schema) in the flow e.g. using a convert shape, data transform.
When the Kafka Data Set is used on the left side of the Data Flow (reading data), then this Data Flow run can be only the real-time flow, as incoming messages come from the stream, so Pega automatically recognizes the real-time Data Flow run, not a batch Data Flow run.
You can create a batch run, when you are using a non-stream input to the Data Flow, like Data Set of the type Table. Having that, you should be able to write into the Kafka Data Set. When Kafka Data Set is configured on the input and output of the Data Flow, then the Data Flow run will be the real-time type only.
I do not know the official timeline, when this component will be shared to the marketplace, as it has to be tested and prepared for different Pega versions. The described AVRO package from my article was delivered for a project need, as a first version of this component and tested for Pega Platform 8.2.1-8.2.3.
The AVRO out of the box support is challenging, as every single customer uses it in a different way. This component has been tested on 8.2.x and sanity checked on 8.3 and it is not supported by Pega in production by GCS. Component will not be released via Marketplace primary due to a very long time it takes to publish and keep the component up to date.That being said, all new versions and fixes will be delivered via github. Tentatively, there is a plan to add native platform Avro support in 8.6.
This component is published the Pega GitHub repository:
I think that this message closes the AVRO conversation in this topic for now, as I am not working on the progress of this component, or maintenance of it. As I mentioned, code can be re-used, but with the risk of no production support for now.