Question

1
Replies
529
Views
AlexeyL8 Member since 2014 3 posts
SBT
Posted: December 6, 2018
Last activity: December 10, 2018
Closed

Enabled services tracing affects their functionality when there is trouble with hazelcast

Hello!

My team faced some strange issue related to hazelcast and integrations.

Prod Env:

Pega Platform 7.3.0

WebSphere 8.5.5.12 + Oracle 12c

Production Level = 5

14 nodes in cluster.

dss trace/cluster/ServiceRuleWatchMaxProductionLevel = 5

There were some troubles with one of nodes, it became unhealthy, logged that it left cluster and return a couple of times. Also there were some logs from hazelcast like this:

java.util.concurrent.TimeoutException: MemberCallableTaskOperation failed to complete within 30 SECONDS.

And then all our rest services logs errors like this:

java.lang.IllegalMonitorStateException: Current thread is not owner of the lock! -> Owner: 0a311d34-35d4-48cc-aad3-1e54d47d0e8f, thread-id: 862961

at com.hazelcast.concurrent.lock.operations.UnlockOperation.unlock(UnlockOperation.java:74)

at com.hazelcast.concurrent.lock.operations.UnlockOperation.run(UnlockOperation.java:63)

at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:186)

at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:401)

at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:117)

at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:102)

at ------ submitted from ------.(Unknown Source)

at java.lang.Thread.getStackTrace(Thread.java:1117)

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:114)

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:75)

at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:155)

at com.hazelcast.spi.impl.AbstractInvocationFuture.join(AbstractInvocationFuture.java:136)

at com.hazelcast.concurrent.lock.LockProxySupport.unlock(LockProxySupport.java:149)

at com.hazelcast.map.impl.proxy.MapProxyImpl.unlock(MapProxyImpl.java:280)

at com.pega.pegarules.cluster.internal.PRHazelcastDistributedMapImpl.unlock(PRHazelcastDistributedMapImpl.java:430)

at com.pega.pegarules.monitor.internal.tracer.DistributedRuleWatchImpl.getClientRequestorID(DistributedRuleWatchImpl.java:413)

at com.pega.pegarules.monitor.internal.tracer.TracerSessionRackImpl.doINeedDibsFlag(TracerSessionRackImpl.java:436)

at com.pega.pegarules.monitor.internal.tracer.TracerSessionRackImpl.getTracerSessionIfEnabled(TracerSessionRackImpl.java:414)

at com.pega.pegarules.integration.engine.internal.services.ServiceAPI.getTracerSession(ServiceAPI.java:2651)

at com.pega.pegarules.integration.engine.internal.services.ServiceAPI.initializeThreadContext(ServiceAPI.java:2635)

at com.pega.pegarules.integration.engine.internal.services.ServiceAPI.withLockSetup(ServiceAPI.java:1318)

at com.pega.pegarules.session.external.engineinterface.service.EngineAPI.processRequestInner(EngineAPI.java:379)

at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)

at java.lang.reflect.Method.invoke(Method.java:508)

at com.pega.pegarules.session.internal.PRSessionProviderImpl.performTargetActionWithLock(PRSessionProviderImpl.java:1315)

at com.pega.pegarules.session.internal.PRSessionProviderImpl.doWithRequestorLocked(PRSessionProviderImpl.java:1052)

at com.pega.pegarules.session.internal.PRSessionProviderImpl.doWithRequestorLocked(PRSessionProviderImpl.java:907)

at com.pega.pegarules.session.external.engineinterface.service.EngineAPI.processRequest(EngineAPI.java:334)

at com.pega.pegarules.integration.engine.internal.services.StatelessServiceAPI.processRequest(StatelessServiceAPI.java:46)

at com.pega.pegarules.integration.engine.internal.services.http.HTTPService.invoke(HTTPService.java:463)

Also then almost all WebContainer threads became parked with hazelcast opertaion on top of each on most of nodes.

The incident was resolved by setting dss to 4, and by node cluster restart.

So, are there any hot fixes for hazelcast functionality or configuration tips? Or maybe some debug recommedations for hazelcast issues and recovery methods?

Data Integration
Moderation Team has archived post
Share this page LinkedIn