skofgar/mercury

View on GitHub
extensions/distributed-trace-aggregator/README.md

Summary

Maintainability
Test Coverage
# Distributed tracer sample application

This is a sample application for the distributed trace aggregator.

DO NOT use this for production. It is meant to be used as a demo app to illustrate how to aggregate trace metrics.
For this demo, the perf metrics are printed onto the standard output.

For production, please write your own custom aggregator.

This application provides service using the route "distributed.trace.processor" to receive trace information.

In addition to printing the trace information as log messages, it is a websocket server for distributed trace
UI applications to connect.

# IMPORTANT - distributed trace logging vs trace aggregator

There are 2 approaches in trace collection:

1. Decentralized

    You can implement a function with route "distributed.tracing" in each application to intercept perf metrics.

2. Centralized

    You can implement a separate application with service called "distributed.trace.processor" to collect
    traces from all application instances in the system. 

The advantage of decentralized approach is that it is lighter weight. It is like have an agent in each application
instance to report perf metrics.

The advantage of centralized approach is that you can aggregate traces from all application instances and decide
how to process them. e.g. You can keep them in a database or search engine for further analysis. However, the
disadvantage is that it would consume more network traffic in the network event stream system. Therefore, you
must ensure the network has sufficient capacity to handle the additional workload.

## Turning on tracing

Distributed traces are initiated at the edge by the REST automation system.

To enable distributed trace, please set "tracing=true" for the REST endpoints in the "rest.yaml" file that
you want to trace. For details, please refer to the REST automation application subproject in the "extensions" packages.

## Transaction journaling

Optionally, you may enable transaction journaling for selected services. To enable journaling, you can define
the service routes in journal config YAML file. Journaling is a superset of distributed trace. You would need
to write your own distributed trace aggregator.

## Sample trace metrics

The following is a sample output when the browser hits the "hello.world" service provided by a python service.
The trace shows that the event passes through 3 services: "hello.world" at the language-connector,
"hello.world" service in python script and "async.http.response" by the rest-automation system.

```
{
  "trace": {
    "path": "GET /api/hello/world",
    "service": "async.http.response",
    "success": true,
    "origin": "2020051088c413a3a33c4d6082be287b1d51a0d8",
    "start": "2020-05-10T23:44:19.290Z",
    "exec_time": 0.418,
    "id": "fee3d82fd3dd47fc883aefb61f2f2fe8"
  },
  "annotations": {},
  "type": "trace"
}
{
  "trace": {
    "path": "GET /api/hello/world",
    "service": "hello.world",
    "success": true,
    "origin": "py0356ba1413324686b2828439634a4d37",
    "start": "2020-05-10T23:44:19.283Z",
    "exec_time": 0.191,
    "id": "fee3d82fd3dd47fc883aefb61f2f2fe8"
  },
  "annotations": {},
  "type": "trace"
}
{
  "trace": {
    "path": "GET /api/hello/world",
    "service": "hello.world",
    "success": true,
    "origin": "202005109b77436f7d1141078fd1a6d65b2bd7bf",
    "start": "2020-05-10T23:44:19.278Z",
    "exec_time": 0.218,
    "id": "fee3d82fd3dd47fc883aefb61f2f2fe8"
  },
  "annotations": {
    "version": "language-connector 2.6.0",
    "target": "py0356ba1413324686b2828439634a4d37"
  },
  "type": "trace"
}
```

## UI application

If you save the perf metrics into a search engine. You can then render the metrics with a dashboard such as 
Kibana or Grafana.

If you want to do your own visualization, you may implement a single page application (React, Angular, etc.)
to render the metrics retrieved from the search engine.