3.1 Autonomic Instrumentation

How do you instrument a distributed IT system for automated monitoring?

3.1.1      Problem

Today’s distributed systems often traverse multiple infrastructure elements, software components, network segments, platforms, and often data centers/locations during end-to-end execution.

How to instrument a distributed IT system consisting of components including Business processes, Application, Databases, Middleware, IOT devices and the infrastructure it runs on, to enable autonomic behaviour and customer insights?

 3.1.2      Solution

  • Insert agents to instrument the components of the IT system to enable distributed tracing of the whole journey of a transaction. A distributed trace may have one or more spans; a span is each individual hop along the trace. To contextualize each span, spans can contain common tags like start timestamps and stop timestamp or semantically relevant tags like a business entity ID.
  • Automatically discover the system map and data flow between the different components of the system
  • A Controller helps make sense of the data from the distributed traces and communicates to a Telemetry system

3.1.3      Application

With a fully instrumented system, you know exactly what is going on in each layer of the system, where your performance bottlenecks are, and how you can debug effectively across the stack.

3.1.4      Examples / Use-cases

When a system is monolithic where all of the application code is one large deployable .war or .ear code on a single host, it’s much easier to reason about where things have gone wrong. But when an IT system components are distributed in a cloud architecture, systems behave differently under load and at scale so it’s important to instrument the system to understand and monitor a system’s actual behaviour.