In short it is an automatic incident detector and resolver.
Yes, it sounds pretty fancy and it would look pretty cool on a flyer or something. However the truth is that it is exactly what it does.
It doesn't have an artificial intelligence or machine learning module, it can do what it is specified inside (so John Connor can still feel safe ...). What it has is as follows:
- a background application (reader worker) that monitors the log files of business applications and sends an alert to the Reaction Engine if an incident occurred
- a server application (Reaction Engine) that can listen to the background application (reader worker) and start an execution flow (which is basically a chain of operating system commands)
- another background application (executor worker) that gets the OS command to be executed from the Reaction Engine (it polls the engine) and execute it on the specified host
- a management web application where all the data can be maintained (e.g. execution flow) and monitoring / controlling the running of the flow
First let's imagine there is an application (called Hermes) which suffers from a memory leak so 2 - 3 times a week the dreadful OutOfMemoryError appears in the log file of the application and the Hermes application hangs. The solution is to restart the application server that hosts Hermes.
How is the memory problem fixed in the normal way? The business users realize that they cannot use their beloved system so they call the service desk, the service desk calls that middleware administrators who will log in to the server machine and check the log file. They realize that the memory leak struck again, they will restart the application server and let the service desk know when the restart finished and the service desk will notify the business users that they can work until ... they can.
The problem here is that there are many human interactions which usually takes lots of time and which takes lots of money for the firm.
How should the incident resolving work? The log file of the Hermes application should be monitored. If the OutOfMemoryError appears in the log file then the application server should be restarted automatically (perhaps including a confirmation step by the middleware administrator), a mail should be sent to the business users before and after the restart.
In order to do that
- the reader worker has to be installed to the machine where the log file of the application resides
- the executor worker has to be installed to the machine where the application server is
- creating the database schema of Reaction and installing the Reaction management web application
- creating the reference data of Hermes application (e.g. where the location of its log file is, what commands should be executed to remedy the memory error, etc.) with the Reaction management web application
- installing the Reaction Engine to Weblogic 12c, Tomcat 8 or Wildfly 10
- configuring the workers (e.g. specifying where the Reaction Engine is, etc.) and start them
- to create the reference data of this new application with the Reaction management web application
- to install, to configure and to start the workers
Basically it is the basic idea behind the Reaction system. However it can do much more, I will show it later in the subsequent blog entries.
In the next blog I will show how to install the worker.
No comments:
Post a Comment