Sunday, 4 March 2018

Introduction to Reaction Engine

The first question what should be clarified is what this Reaction Engine ( application is for?

In short it is an automatic incident detector and resolver.

Yes, it sounds pretty fancy and it would look pretty cool on a flyer or something. However the truth is that it is exactly what it does.
It doesn't have an artificial intelligence or machine learning module, it can do what it is specified inside (so John Connor can still feel safe ...). What it has is as follows:
  • a background application (reader worker) that monitors the log files of business applications and sends an alert to the Reaction Engine if an incident occurred
  • a server application (Reaction Engine) that can listen to the background application (reader worker) and start an execution flow (which is basically a chain of operating system commands)
  • another background application (executor worker) that gets the OS command to be executed from the Reaction Engine (it polls the engine) and execute it on the specified host
  • a management web application where all the data can be maintained (e.g. execution flow) and monitoring / controlling the running of the flow
So how can it be made to work? 

First let's imagine there is an application (called Hermes) which suffers from a memory leak so 2 - 3 times a week the dreadful OutOfMemoryError appears in the log file of the application and the Hermes application hangs. The solution is to restart the application server that hosts Hermes.

How is the memory problem fixed in the normal way? The business users realize that they cannot use their beloved system so they call the service desk, the service desk calls that middleware administrators who will log in to the server machine and check the log file. They realize that the memory leak struck again, they will restart the application server and let the service desk know when the restart finished and the service desk will notify the business users that they can work until ... they can.
The problem here is that there are many human interactions which usually takes lots of time and which takes lots of money for the firm.

How should the incident resolving work? The log file of the Hermes application should be monitored. If the OutOfMemoryError appears in the log file then the application server should be restarted automatically (perhaps including a confirmation step by the middleware administrator), a mail should be sent to the business users before and after the restart.
In order to do that
  • the reader worker has to be installed to the machine where the log file of the application resides
  • the executor worker has to be installed to the machine where the application server is
  • creating the database schema of Reaction and installing the Reaction management web application
  • creating the reference data of Hermes application (e.g. where the location of its log file is, what commands should be executed to remedy the memory error, etc.) with the Reaction management web application
  • installing the Reaction Engine to Weblogic 12c, Tomcat 8 or Wildfly 10
  • configuring the workers (e.g. specifying where the Reaction Engine is, etc.) and start them
It is important to note if a new application has to be monitored by the Reaction Engine then all has to be done is
  • to create the reference data of this new application  with the Reaction management web application
  • to install, to configure and to start the workers 
The reader worker will notice the error in the log file of Hermes application, it will send an alert to the Reaction Engine. The engine will select the execution flow that can remedy the problem (restarting the application server) and will notify the executor worker to execute commands (the workers are always the clients so the engine cannot send message to the worker directly but the workers poll the engine).

Basically it is the basic idea behind the Reaction system. However it can do much more, I will show it later in the subsequent blog entries.
In the next blog I will show how to install the worker.

No comments:

Post a Comment

Worker - let's make it work