firehol/netdata

View on GitHub
src/libnetdata/log/README.md

Summary

Maintainability
Test Coverage
<!--
title: "Log"
custom_edit_url: https://github.com/netdata/netdata/edit/master/src/libnetdata/log/README.md
sidebar_label: "Log"
learn_status: "Published"
learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata"
-->

# Netdata Logging

This document describes how Netdata generates its own logs, not how Netdata manages and queries logs databases.

## Log sources

Netdata supports the following log sources:

1. **daemon**, logs generated by Netdata daemon.
2. **collector**, logs generated by Netdata collectors, including internal and external ones.
3. **access**, API requests received by Netdata
4. **health**, all alert transitions and notifications

## Log outputs

For each log source, Netdata supports the following output methods:

- **off**, to disable this log source
- **journal**, to send the logs to systemd-journal.
- **syslog**, to send the logs to syslog.
- **system**, to send the output to `stderr` or `stdout` depending on the log source.
- **stdout**, to write the logs to Netdata's `stdout`.
- **stderr**, to write the logs to Netdata's `stderr`.
- **filename**, to send the logs to a file.

For `daemon` and `collector` the default is `journal` when systemd-journal is available.
To decide if systemd-journal is available, Netdata checks:

1. `stderr` is connected to systemd-journald
2. `/run/systemd/journal/socket` exists
3. `/host/run/systemd/journal/socket` exists (`/host` is configurable in containers)

If any of the above is detected, Netdata will select `journal` for `daemon` and `collector` sources.

All other sources default to a file.

## Log formats

| Format  | Description                                                                                            |
|---------|--------------------------------------------------------------------------------------------------------|
| journal | journald-specific log format. Automatically selected when logging to systemd-journal.                  |
| logfmt  | logs data as a series of key/value pairs. The default when logging to any output other than `journal`. |
| json    | logs data in JSON format.                                                                              |

## Log levels

Each time Netdata logs, it assigns a priority to the log. It can be one of this (in order of importance):

| Level     | Description                                                                            |
|-----------|----------------------------------------------------------------------------------------|
| emergency | a fatal condition, Netdata will most likely exit immediately after.                    |
| alert     | a very important issue that may affect how Netdata operates.                           |
| critical  | a very important issue the user should know which, Netdata thinks it can survive.      |
| error     | an error condition indicating that Netdata is trying to do something, but it fails.    |
| warning   | something unexpected has happened that may or may not affect the operation of Netdata. |
| notice    | something that does not affect the operation of Netdata, but the user should notice.   |
| info      | the default log level about information the user should know.                          |
| debug     | these are more verbose logs that can be ignored.                                       |

## Logs Configuration

In `netdata.conf`, there are the following settings:

```
[logs]
    # logs to trigger flood protection = 1000
    # logs flood protection period = 1m
    # facility = daemon
    # level = info
    # daemon = journal
    # collector = journal
    # access = /var/log/netdata/access.log
    # health = /var/log/netdata/health.log
```

- `logs to trigger flood protection` and `logs flood protection period` enable logs flood protection for `daemon` and `collector` sources. It can also be configured per log source.
- `facility` is used only when Netdata logs to syslog.
- `level` defines the minimum [log level](#log-levels) of logs that will be logged. This setting is applied only to `daemon` and `collector` sources. It can also be configured per source.

### Configuring log sources

Each for the sources (`daemon`, `collector`, `access`, `health`), accepts the following: 

```
source = {FORMAT},level={LEVEL},protection={LOG}/{PERIOD}@{OUTPUT}
```

Where:

- `{FORMAT}`, is one of the [log formats](#log-formats),
- `{LEVEL}`, is the minimum [log level](#log-levels) to be logged,
- `{LOGS}` is the number of `logs to trigger flood protection` configured per output,
- `{PERIOD}` is the equivalent of `logs flood protection period` configured per output,
- `{OUTPUT}` is one of the `[log outputs](#log-outputs),

All parameters can be omitted, except `{OUTPUT}`. If `{OUTPUT}` is the only given parameter, `@` can be omitted.

### Logs rotation

Netdata comes with `logrotate` configuration to rotate its log files periodically.

The default is usually found in `/etc/logrotate.d/netdata`.

Sending a `SIGHUP` to Netdata, will instruct it to re-open all its log files.

## Log Fields

<details>
<summary>All fields exposed by Netdata</summary>

|                journal                 |             logfmt             |              json              |                                                Description                                                |
|:--------------------------------------:|:------------------------------:|:------------------------------:|:---------------------------------------------------------------------------------------------------------:|
|      `_SOURCE_REALTIME_TIMESTAMP`      |             `time`             |             `time`             |                                        the timestamp of the event                                         |
|          `SYSLOG_IDENTIFIER`           |             `comm`             |             `comm`             |                                       the program logging the event                                       |
|            `ND_LOG_SOURCE`             |            `source`            |            `source`            |                                  one of the [log sources](#log-sources)                                   |
|         `PRIORITY`<br/>numeric         |        `level`<br/>text        |      `level`<br/>numeric       |                                   one of the [log levels](#log-levels)                                    |
|                `ERRNO`                 |            `errno`             |            `errno`             |                                       the numeric value of `errno`                                        |
|            `INVOCATION_ID`             |               -                |               -                | a unique UUID of the Netdata session, reset on every Netdata restart, inherited by systemd when available |
|              `CODE_LINE`               |               -                |               -                |                         the line number of of the source code logging this event                          |
|              `CODE_FILE`               |               -                |               -                |                            the filename of the source code logging this event                             |
|            `CODE_FUNCTION`             |               -                |               -                |                          the function name of the source code logging this event                          |
|                 `TID`                  |             `tid`              |             `tid`              |                              the thread id of the thread logging this event                               |
|              `THREAD_TAG`              |            `thread`            |            `thread`            |                                 the name of the thread logging this event                                 |
|              `MESSAGE_ID`              |            `msg_id`            |            `msg_id`            |                                      see [message IDs](#message-ids)                                      |
|              `ND_MODULE`               |            `module`            |            `module`            |                                   the Netdata module logging this event                                   |
|             `ND_NIDL_NODE`             |             `node`             |             `node`             |                             the hostname of the node the event is related to                              |
|           `ND_NIDL_INSTANCE`           |           `instance`           |           `instance`           |                             the instance of the node the event is related to                              |
|           `ND_NIDL_CONTEXT`            |           `context`            |           `context`            |    the context the event is related to (this is usually the chart name, as shown on netdata dashboards    |
|          `ND_NIDL_DIMENSION`           |          `dimension`           |          `dimension`           |                                   the dimension the event is related to                                   |
|           `ND_SRC_TRANSPORT`           |        `src_transport`         |        `src_transport`         |                  when the event happened during a request, this is the request transport                  |
|              `ND_SRC_IP`               |            `src_ip`            |            `src_ip`            |          when the event happened during an inbound request, this is the IP the request came from          |
|             `ND_SRC_PORT`              |           `src_port`           |           `src_port`           |         when the event happened during an inbound request, this is the port the request came from         |
|        `ND_SRC_FORWARDED_HOST`         |      `src_forwarded_host`      |      `src_forwarded_host`      |                            the contents of the HTTP header `X-Forwarded-Host`                             |
|         `ND_SRC_FORWARDED_FOR`         |      `src_forwarded_for`       |      `src_forwarded_for`       |                             the contents of the HTTP header `X-Forwarded-For`                             |
|         `ND_SRC_CAPABILITIES`          |       `src_capabilities`       |       `src_capabilities`       |          when the request came from a child, this is the communication capabilities of the child          |
|           `ND_DST_TRANSPORT`           |        `dst_transport`         |        `dst_transport`         |        when the event happened during an outbound request, this is the outbound request transport         |
|              `ND_DST_IP`               |            `dst_ip`            |            `dst_ip`            |        when the event happened during an outbound request, this is the IP the request destination         |
|             `ND_DST_PORT`              |           `dst_port`           |           `dst_port`           |       when the event happened during an outbound request, this is the port the request destination        |
|         `ND_DST_CAPABILITIES`          |       `dst_capabilities`       |       `dst_capabilities`       |          when the request goes to a parent, this is the communication capabilities of the parent          |
|          `ND_REQUEST_METHOD`           |          `req_method`          |          `req_method`          |      when the event happened during an inbound request, this is the method the request was received       |
|           `ND_RESPONSE_CODE`           |             `code`             |             `code`             |                         when responding to a request, this this the response code                         |
|           `ND_CONNECTION_ID`           |             `conn`             |             `conn`             |            when there is a connection id for an inbound connection, this is the connection id             |
|          `ND_TRANSACTION_ID`           |         `transaction`          |         `transaction`          |                               the transaction id (UUID) of all API requests                               |
|        `ND_RESPONSE_SENT_BYTES`        |          `sent_bytes`          |          `sent_bytes`          |                                    the bytes we sent to API responses                                     |
|        `ND_RESPONSE_SIZE_BYTES`        |          `size_bytes`          |          `size_bytes`          |                                the uncompressed bytes of the API responses                                |
|      `ND_RESPONSE_PREP_TIME_USEC`      |           `prep_ut`            |           `prep_ut`            |                                   the time needed to prepare a response                                   |
|      `ND_RESPONSE_SENT_TIME_USEC`      |           `sent_ut`            |           `sent_ut`            |                                    the time needed to send a response                                     |
|     `ND_RESPONSE_TOTAL_TIME_USEC`      |           `total_ut`           |           `total_ut`           |                               the total time needed to complete a response                                |
|             `ND_ALERT_ID`              |           `alert_id`           |           `alert_id`           |                                   the alert id this event is related to                                   |
|          `ND_ALERT_EVENT_ID`           |        `alert_event_id`        |        `alert_event_id`        |                          a sequential number of the alert transition (per host)                           |
|          `ND_ALERT_UNIQUE_ID`          |       `alert_unique_id`        |       `alert_unique_id`        |                          a sequential number of the alert transition (per alert)                          |
|        `ND_ALERT_TRANSITION_ID`        |     `alert_transition_id`      |     `alert_transition_id`      |                                 the unique UUID of this alert transition                                  |
|           `ND_ALERT_CONFIG`            |         `alert_config`         |         `alert_config`         |                                    the alert configuration hash (UUID)                                    |
|            `ND_ALERT_NAME`             |            `alert`             |            `alert`             |                                              the alert name                                               |
|            `ND_ALERT_CLASS`            |         `alert_class`          |         `alert_class`          |                                         the alert classification                                          |
|          `ND_ALERT_COMPONENT`          |       `alert_component`        |       `alert_component`        |                                            the alert component                                            |
|            `ND_ALERT_TYPE`             |          `alert_type`          |          `alert_type`          |                                              the alert type                                               |
|            `ND_ALERT_EXEC`             |          `alert_exec`          |          `alert_exec`          |                                      the alert notification program                                       |
|          `ND_ALERT_RECIPIENT`          |       `alert_recipient`        |       `alert_recipient`        |                                          the alert recipient(s)                                           |
|            `ND_ALERT_VALUE`            |         `alert_value`          |         `alert_value`          |                                          the current alert value                                          |
|          `ND_ALERT_VALUE_OLD`          |       `alert_value_old`        |       `alert_value_old`        |                                         the previous alert value                                          |
|           `ND_ALERT_STATUS`            |         `alert_status`         |         `alert_status`         |                                         the current alert status                                          |
|         `ND_ALERT_STATUS_OLD`          |       `alert_value_old`        |       `alert_value_old`        |                                         the previous alert value                                          |
|            `ND_ALERT_UNITS`            |         `alert_units`          |         `alert_units`          |                                          the units of the alert                                           |
|           `ND_ALERT_SUMMARY`           |        `alert_summary`         |        `alert_summary`         |                                       the summary text of the alert                                       |
|            `ND_ALERT_INFO`             |          `alert_info`          |          `alert_info`          |                                        the info text of the alert                                         |
|          `ND_ALERT_DURATION`           |        `alert_duration`        |        `alert_duration`        |                             the duration the alert was in its previous state                              |
| `ND_ALERT_NOTIFICATION_TIMESTAMP_USEC` | `alert_notification_timestamp` | `alert_notification_timestamp` |                           the timestamp the notification delivery is scheduled                            |
|              `ND_REQUEST`              |           `request`            |           `request`            |                             the full request during which the event happened                              |
|               `MESSAGE`                |             `msg`              |             `msg`              |                                             the event message                                             |

</details>

### Message IDs

Netdata assigns specific message IDs to certain events:

- `ed4cdb8f1beb4ad3b57cb3cae2d162fa` when a Netdata child connects to this Netdata
- `6e2e3839067648968b646045dbf28d66` when this Netdata connects to a Netdata parent
- `9ce0cb58ab8b44df82c4bf1ad9ee22de` when alerts change state
- `6db0018e83e34320ae2a659d78019fb7` when notifications are sent

You can view these events using the Netdata systemd-journal.plugin at the `MESSAGE_ID` filter,
or using `journalctl` like this:

```bash
# query children connection
journalctl MESSAGE_ID=ed4cdb8f1beb4ad3b57cb3cae2d162fa

# query parent connection
journalctl MESSAGE_ID=6e2e3839067648968b646045dbf28d66

# query alert transitions
journalctl MESSAGE_ID=9ce0cb58ab8b44df82c4bf1ad9ee22de

# query alert notifications
journalctl MESSAGE_ID=6db0018e83e34320ae2a659d78019fb7
```

## Using journalctl to query Netdata logs

The Netdata service's processes execute within the `netdata` journal namespace. To view the Netdata logs, you should
specify the `--namespace=netdata` option.

```bash
# Netdata logs since the last time the service was started
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata

# All netdata logs, the oldest entries are displayed first  
journalctl -u netdata --namespace=netdata

# All netdata logs, the newest entries are displayed first  
journalctl -u netdata --namespace=netdata -r
```