3.5 Autonomic Manager

How to automate an appropriate action based on dynamic monitoring of operational health of an IT system?

3.5.1 Problem

IT systems have increasingly highly distributed, and interactive components, which operate in highly variable, even unpredictable environments. Changes in an IT environment are increasingly dynamic with consumption of services running on private, public or hybrid cloud environments. Automation is generally more reliable and repeatable than work conducted manually by humans, which means that it produces higher-quality standards and tighter tolerances. Data-driven decisions win over decisions based on feelings or hunches.

In a complex, distributed IT system, how to automate actions when the telemetry data shows events that are abnormal?

3.5.2 Solution

Use an orchestrator to change the behavior of the managed resources of the IT service, based on the actions recommended by the <ServiceMonitor> and <AutomatedRCA> patterns.

An Orchestrator in the context of an Autonomic Manager provides a means for aggregating a set of workflow tasks into a process. The process can be scheduled to run on a tenant and the results or the run time status of the process can be persisted for visibility, audit trail and tracking.

3.5.3 Application

There are many types of processes an Autonomic Manager might need:

  1. Remediation

A remediation usually involves a service restoration e.g. restart a server, increase the size of a JVM. The decision tree that is required to identify the action for a remediation is done by <AutomatedRCA> in collaboration with <ServiceMonitor>.

  1. Automated Administration

Infrastructure resources undergo numerous administrative tasks that need to be repeatedly and efficiently carried out. These tasks usually become subject to human error and slow response times when performed manually. The administrative tasks can involve a resource provisioning, scaling up or down a resource, notifying usage metering and billing at periodic intervals, evaluating capacity utilization of a resource and reporting.

These tasks can be automated with scripts and wired in a standard workflow and placed with Autonomous Manager library. The workflows can be invoked by <ServiceMonitor> with appropriate inputs and schedule.

3.5.4 Examples / Use-cases

Netflix’s Conductor – an open source microservices orchestrator engine orchestrates more than 2.6 million process flows ranging from simple linear workflows to very complex dynamic workflows that run over multiple days. The documentation on Netflix’s Conductor is available here. A kitchensink workflow example is here.