netdata/netdata

View on GitHub
src/go/collectors/go.d.plugin/modules/nvme/integrations/nvme_devices.md

Summary

Maintainability
Test Coverage
<!--startmeta
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/collectors/go.d.plugin/modules/nvme/README.md"
meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/collectors/go.d.plugin/modules/nvme/metadata.yaml"
sidebar_label: "NVMe devices"
learn_status: "Published"
learn_rel_path: "Collecting Metrics/Storage, Mount Points and Filesystems"
most_popular: False
message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
endmeta-->

# NVMe devices


<img src="https://netdata.cloud/img/nvme.svg" width="150"/>


Plugin: go.d.plugin
Module: nvme

<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />

## Overview

This collector monitors the health of NVMe devices. It relies on the [`nvme`](https://github.com/linux-nvme/nvme-cli#nvme-cli) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management.




This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.


### Default Behavior

#### Auto-Detection

This integration doesn't support auto-detection.

#### Limits

The default configuration for this integration does not impose any limits on data collection.

#### Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.


## Metrics

Metrics grouped by *scope*.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.



### Per device

These metrics refer to the NVME device.

Labels:

| Label      | Description     |
|:-----------|:----------------|
| device | NVMe device name |

Metrics:

| Metric | Dimensions | Unit |
|:------|:----------|:----|
| nvme.device_estimated_endurance_perc | used | % |
| nvme.device_available_spare_perc | spare | % |
| nvme.device_composite_temperature | temperature | celsius |
| nvme.device_io_transferred_count | read, written | bytes |
| nvme.device_power_cycles_count | power | cycles |
| nvme.device_power_on_time | power-on | seconds |
| nvme.device_critical_warnings_state | available_spare, temp_threshold, nvm_subsystem_reliability, read_only, volatile_mem_backup_failed, persistent_memory_read_only | state |
| nvme.device_unsafe_shutdowns_count | unsafe | shutdowns |
| nvme.device_media_errors_rate | media | errors/s |
| nvme.device_error_log_entries_rate | error_log | entries/s |
| nvme.device_warning_composite_temperature_time | wctemp | seconds |
| nvme.device_critical_composite_temperature_time | cctemp | seconds |
| nvme.device_thermal_mgmt_temp1_transitions_rate | temp1 | transitions/s |
| nvme.device_thermal_mgmt_temp2_transitions_rate | temp2 | transitions/s |
| nvme.device_thermal_mgmt_temp1_time | temp1 | seconds |
| nvme.device_thermal_mgmt_temp2_time | temp2 | seconds |



## Alerts


The following alerts are available:

| Alert name  | On metric | Description |
|:------------|:----------|:------------|
| [ nvme_device_critical_warnings_state ](https://github.com/netdata/netdata/blob/master/src/health/health.d/nvme.conf) | nvme.device_critical_warnings_state | NVMe device ${label:device} has critical warnings |


## Setup

### Prerequisites

#### Install nvme-cli

See [Distro Support](https://github.com/linux-nvme/nvme-cli#distro-support). Install `nvme-cli` using your distribution's package manager.



### Configuration

#### File

The configuration file name for this integration is `go.d/nvme.conf`.


You can edit the configuration file using the `edit-config` script from the
Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).

```bash
cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/nvme.conf
```
#### Options

The following options can be defined globally: update_every, autodetection_retry.


<details><summary>Config options</summary>

| Name | Description | Default | Required |
|:----|:-----------|:-------|:--------:|
| update_every | Data collection frequency. | 10 | no |
| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
| timeout | nvme binary execution timeout. | 2 | no |

</details>

#### Examples

##### Custom update_every

Allows you to override the default data collection interval.

<details><summary>Config</summary>

```yaml
jobs:
  - name: nvme
    update_every: 5  # Collect NVMe metrics every 5 seconds

```
</details>



## Troubleshooting

### Debug Mode

To troubleshoot issues with the `nvme` collector, run the `go.d.plugin` with the debug option enabled. The output
should give you clues as to why the collector isn't working.

- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
  your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.

  ```bash
  cd /usr/libexec/netdata/plugins.d/
  ```

- Switch to the `netdata` user.

  ```bash
  sudo -u netdata -s
  ```

- Run the `go.d.plugin` to debug the collector:

  ```bash
  ./go.d.plugin -d -m nvme
  ```