src/go/plugin/go.d/modules/intelgpu/integrations/intel_gpu.md from firehol/netdata

src/go/plugin/go.d/modules/intelgpu/integrations/intel_gpu.md
Summary

Maintainability

Test Coverage

Issues
<!--startmeta
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/intelgpu/README.md"
meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/intelgpu/metadata.yaml"
sidebar_label: "Intel GPU"
learn_status: "Published"
learn_rel_path: "Collecting Metrics/Hardware Devices and Sensors"
most_popular: False
message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
endmeta-->

# Intel GPU


<img src="https://netdata.cloud/img/microchip.svg" width="150"/>


Plugin: go.d.plugin
Module: intelgpu

<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />

## Overview

This collector gathers performance metrics for Intel integrated GPUs.
It relies on the [`intel_gpu_top`](https://manpages.debian.org/testing/intel-gpu-tools/intel_gpu_top.1.en.html) CLI tool but avoids directly executing the binary.
Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment.
This approach eliminates the need to grant the CAP_PERFMON capability to `intel_gpu_top`, improving security and potentially simplifying permission management.




This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.


### Default Behavior

#### Auto-Detection

This integration doesn't support auto-detection.

#### Limits

The default configuration for this integration does not impose any limits on data collection.

#### Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.


## Metrics

Metrics grouped by *scope*.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.



### Per Intel GPU instance

These metrics refer to the Intel GPU.

This scope has no labels.

Metrics:

| Metric | Dimensions | Unit |
|:------|:----------|:----|
| intelgpu.frequency | frequency | MHz |
| intelgpu.power | gpu, package | Watts |

### Per engine

These metrics refer to the GPU hardware engine.

Labels:

| Label      | Description     |
|:-----------|:----------------|
| engine_class | Engine class (Render/3D, Blitter, VideoEnhance, Video, Compute). |
| engine_instance | Engine instance (e.g. Render/3D/0, Video/0, Video/1). |

Metrics:

| Metric | Dimensions | Unit |
|:------|:----------|:----|
| intelgpu.engine_busy_perc | busy | percentage |



## Alerts

There are no alerts configured by default for this integration.


## Setup

### Prerequisites

#### Install intel-gpu-tools

Install `intel-gpu-tools` using your distribution's package manager.


### Configuration

#### File

The configuration file name for this integration is `go.d/intelgpu.conf`.


You can edit the configuration file using the `edit-config` script from the
Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).

```bash
cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/intelgpu.conf
```
#### Options

The following options can be defined globally: update_every.


<details open><summary>Config options</summary>

| Name | Description | Default | Required |
|:----|:-----------|:-------|:--------:|
| update_every | Data collection frequency. | 1 | no |
| device | Select a specific GPU using [supported filter](https://manpages.debian.org/testing/intel-gpu-tools/intel_gpu_top.1.en.html#DESCRIPTION). |  | no |

</details>

#### Examples

##### Custom update_every

Allows you to override the default data collection interval.

<details open><summary>Config</summary>

```yaml
jobs:
  - name: intelgpu
    update_every: 5  # Collect Intel iGPU metrics every 5 seconds

```
</details>



## Troubleshooting

### Debug Mode

**Important**: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the `intelgpu` collector, run the `go.d.plugin` with the debug option enabled. The output
should give you clues as to why the collector isn't working.

- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
  your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.

  ```bash
  cd /usr/libexec/netdata/plugins.d/
  ```

- Switch to the `netdata` user.

  ```bash
  sudo -u netdata -s
  ```

- Run the `go.d.plugin` to debug the collector:

  ```bash
  ./go.d.plugin -d -m intelgpu
  ```

### Getting Logs

If you're encountering problems with the `intelgpu` collector, follow these steps to retrieve logs and identify potential issues:

- **Run the command** specific to your system (systemd, non-systemd, or Docker container).
- **Examine the output** for any warnings or error messages that might indicate issues.  These messages should provide clues about the root cause of the problem.

#### System with systemd

Use the following command to view logs generated since the last Netdata service restart:

```bash
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep intelgpu
```

#### System without systemd

Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:

```bash
grep intelgpu /var/log/netdata/collector.log
```

**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.

#### Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

```bash
docker logs netdata 2>&1 | grep intelgpu
```