Do people monitor ERROR / FATAL in PegaRULES log file?
This is not a technical question but operation management practice question. I understand Pega writes ERROR or FATAL in case any error occurs in application. You can look at it from log file directly, or use some monitoring tool like ELK. In my previous project, since customer's monitoring system watches ERROR and FATAL string in log file, whenever it is recorded, it goes to system administrator, and they will look into details. Obviously with this operation, a bunch of ERROR have to be reviewed during development and many of them have to be added into the "ignorance list", otherwise it will end up with phone ringing every day. But anyways, we did this time-consuming job and some of them, developers fixed the code, and some of them, we decided to ignore it. I did this approach, but is this something that people do around the world? It was a very time-consuming job, so I don't think everyone is doing the same. Maybe people just develop it, and no monitor in log file? If there is any standard practice on log watch policy with Pega, please let me know.
Pega writes ERROR in more casual manner than people think. For developer's custom ERROR of course it has to be removed but Pega throws a lot of ERRORs for some proper behavior that it shouldn't. One of the examples that I can think of is ARO. If I set approval privilege on an approval frow, and if someone who does not have approval privilege tries to open the approval assignment, Pega writes ERROR. To me, it is not an ERROR because it is an appropriate limitation how it is supposed to work. There are also some other similar ones. No ERROR on production is very difficult but do people really do this huge effort to eliminate all the ERROR logs during development?
Pega has a service for that! Please try using Predictive Diagnostic Cloud (PDC). It is free for all customers. One of its functionality is setting mail notifications for errors. Please refer to this link to get started:
Most of our Customers using PDC system for monitoring environment as Szymon mentioned. System display errors and alerts that are active or occurred in past. Of course, if you think that some of the alerts are not relevant you may classify it to the respective group. For the crucial alerts you can create notifications that will be sent on e-mail. There are a lot of other matrix that present health of the cluster, application, database...