Question
Pega 8.4 kafka stability issues: CharlatanExceptions in logs
We recently upgraded to 8.4.1 from 7.2.2 and after that we have been constantly seeing lot of issues related to kafka on our Dev and SIT servers.
Recently we have done multi node setup in our test environments (2 separate servers) with default node classification (i.e Web, BackgroundProcessing, Search and Stream)
1) we started seeing strange issues like the below log which gets logged every 10 seconds.
We tried truncating PR_SYS_STATUSNODES and started the nodes and things looked fine for couple of days before surfacing again.
Stream status shows normal for both the nodes.
2) One more strange behaviour is that Admin studio is showing '0 nodes as running' though both the nodes are running fine and we're able to log into the applications on both the nodes.
The communication b/w both the nodes also seem to be working - I telnet'ed connecting to different ports used by pega from one node to other and the connection is established okay (unless I overlooked something).
Has anyone faced this and any help is really appreciated?
=================================
2020-08-11 22:35:55,215 [ New I/O worker #65] [ STANDARD] [ ] [ ] (ion.service.SessionServiceImpl) ERROR - Failed to accept in coming connection for the session '-1375665877' com.pega.charlatan.utils.CharlatanException$SessionExpiredException: KeeperErrorCode = Session expired at com.pega.charlatan.session.service.SessionServiceImpl.handleConnectRequest(SessionServiceImpl.java:129) ~[charlatan-server.jar:?] at com.pega.charlatan.session.service.SessionServiceImpl.processRequest(SessionServiceImpl.java:84) ~[charlatan-server.jar:?] at com.pega.charlatan.server.CharlatanNettyConnection.receiveMessage(CharlatanNettyConnection.java:78) ~[charlatan-server.jar:?] at com.pega.charlatan.server.CharlatanNettyServer$CharlatanChannelHandler.processMessage(CharlatanNettyServer.java:213) ~[charlatan-server.jar:?] at com.pega.charlatan.server.CharlatanNettyServer$CharlatanChannelHandler.messageReceived(CharlatanNettyServer.java:207) ~[charlatan-server.jar:?] at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) ~[netty-3.10.6.Final.jar:?] at com.pega.charlatan.server.CharlatanNettyServer$CharlatanChannelHandler.handleUpstream(CharlatanNettyServer.java:200) ~[charlatan-server.jar:?] at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.6.Final.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_77] at com.pega.dsm.dnode.util.PrpcRunnable$1.run(PrpcRunnable.java:59) ~[d-node.jar:?] at com.pega.dsm.dnode.util.PrpcRunnable$1.run(PrpcRunnable.java:56) ~[d-node.jar:?] at com.pega.dsm.dnode.util.PrpcRunnable.execute(PrpcRunnable.java:67) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.PrpcThreadFactory$PrpcThread.run(PrpcThreadFactory.java:124) ~[d-node.jar:?]
===============================
***Edited by Moderator Marissa to change type from General to Upgrade, update Platform Capability tags****
Hi Sreedhar,
Are you on cloud or on-premise?
If on-premise, could you please check if there is a tag in the standalone-full-ha.xml
<default-missing-method-permissions-deny-access value="true"/>
This blocks some of the platform functionality and if set to false, then it will correctly display the nodes in admin studio
Thank you,
Kanchan