Yes, I promised in the end of the last (which was the first) blog entry that I will tell some details how to install the worker but it would be worth to give some time to the architecture (the main reason of this is that I have a picture about the architecture that I want to share; also the previous blog entry didn't contain any image...)
So first let's see the diagram:
Nice, isn't it? Well, please don't answer... :)
The HOST 0, HOST 1 and HOST 2 are the machines where the observed business applications are (more precisely where the log files of the applications are). As you can see the reader worker (the one that observes the log file(s)) has to be started on those machines only where the logs reside. Also the executor worker (the one that executes the OS command one by one) has to be started on those machines where operating system commands have to be executed. The workers are JAVA (JDK 8 is needed) applications. There is only one worker installation pack but there are different start/stop commands for the executor and the reader.
The reader worker calls 2 REST services:
- getting the log file locations to be monitored (can be configured how often the REST service will be called -> the more often the service called the faster a change will be propagated to the reader worker)
- reporting an incident
- getting the commands to be executed (the more often the service is called the less delay will be in the execution flow)
- sending back the result/output of the OS command
HMAC authentication is used when the call is made. It means that the hash is not static (it is made of the password, current date, used HTTP verb, endpoint, etc.) i.e. it cannot be decrypted. The username and password must exist in the worker and in the Reaction Engine too. The authentication is mandatory.
The HTTP message can be encrypted which can be based on the username and password (which is used in HMAC authentication) or on public / private keys in certificate. The advantage of this encryption over HTTPS is that no need to rebuild the secure channel if the message goes through network devices.
The administration web application is a python-Django (python3 is needed) application and its main tasks are
- to maintain the reference data
- to monitor the run of the execution flows
- to start / to schedule the execution
- to give statistics
- to provide a user management module
- approving to start a flow
- starting / scheduling a flow
- restarting / skipping a task of an execution flow
The Reaction Engine is a Java web application (JDK8 is needed) that can be deployed on Tomcat 8, on Wildfly 10 or on Weblogic 12c (separate WAR files are provided in the download section).
What it does is as follows:
- it provides REST interface for workers and the management web application (see above)
- it makes a decision if a reported event (by the reader worker) is a real incident and a flow has to be started
- performing the execution flow
i.e. getting the first task in the flow, executing it then getting the second one, etc.
based on the type of the task (OS command, if-else operation, mail sending) the engine will execute it differently:
- if it is an OS command then it will provide it to the worker (i.e. it will just wait until the specific executor worker will call the REST service to get the command to be executed) and after the execution it will save the the output (if the command wasn't executed successfully then the flow will fail) and jump to the next task in the flow
- if it is an if-else operation then it will get the output of the preceding task (which must be an OS command and it has to be an output) and use this output value to evaluate the condition of the if-else; if the condition is true to jump to the true branch (if it exists), if it is false then jump to the false branch (if it exists)
- if it is an mail sending task then send the mail and jump to the next task in the flow