firehol/netdata

View on GitHub
src/go/plugin/go.d/modules/vsphere/integrations/vmware_vcenter_server.md

Summary

Maintainability
Test Coverage
<!--startmeta
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/vsphere/README.md"
meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/vsphere/metadata.yaml"
sidebar_label: "VMware vCenter Server"
learn_status: "Published"
learn_rel_path: "Collecting Metrics/Containers and VMs"
most_popular: True
message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
endmeta-->

# VMware vCenter Server


<img src="https://netdata.cloud/img/vmware.svg" width="150"/>


Plugin: go.d.plugin
Module: vsphere

<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />

## Overview

This collector monitors hosts and vms performance statistics from `vCenter` servers.

> **Warning**: The `vsphere` collector cannot re-login and continue collecting metrics after a vCenter reboot.
> go.d.plugin needs to be restarted.




This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.


### Default Behavior

#### Auto-Detection

This integration doesn't support auto-detection.

#### Limits

The default configuration for this integration does not impose any limits on data collection.

#### Performance Impact

The default `update_every` is 20 seconds, and it doesn't make sense to decrease the value.
**VMware real-time statistics are generated at the 20-second specificity**.

It is likely that 20 seconds is not enough for big installations and the value should be tuned.

To get a better view we recommend running the collector in debug mode and seeing how much time it will take to collect metrics.

<details>
<summary>Example (all not related debug lines were removed)</summary>

```
[ilyam@pc]$ ./go.d.plugin -d -m vsphere
[ DEBUG ] vsphere[vsphere] discover.go:94 discovering : starting resource discovering process
[ DEBUG ] vsphere[vsphere] discover.go:102 discovering : found 3 dcs, process took 49.329656ms
[ DEBUG ] vsphere[vsphere] discover.go:109 discovering : found 12 folders, process took 49.538688ms
[ DEBUG ] vsphere[vsphere] discover.go:116 discovering : found 3 clusters, process took 47.722692ms
[ DEBUG ] vsphere[vsphere] discover.go:123 discovering : found 2 hosts, process took 52.966995ms
[ DEBUG ] vsphere[vsphere] discover.go:130 discovering : found 2 vms, process took 49.832979ms
[ INFO  ] vsphere[vsphere] discover.go:140 discovering : found 3 dcs, 12 folders, 3 clusters (2 dummy), 2 hosts, 3 vms, process took 249.655993ms
[ DEBUG ] vsphere[vsphere] build.go:12 discovering : building : starting building resources process
[ INFO  ] vsphere[vsphere] build.go:23 discovering : building : built 3/3 dcs, 12/12 folders, 3/3 clusters, 2/2 hosts, 3/3 vms, process took 63.3µs
[ DEBUG ] vsphere[vsphere] hierarchy.go:10 discovering : hierarchy : start setting resources hierarchy process
[ INFO  ] vsphere[vsphere] hierarchy.go:18 discovering : hierarchy : set 3/3 clusters, 2/2 hosts, 3/3 vms, process took 6.522µs
[ DEBUG ] vsphere[vsphere] filter.go:24 discovering : filtering : starting filtering resources process
[ DEBUG ] vsphere[vsphere] filter.go:45 discovering : filtering : removed 0 unmatched hosts
[ DEBUG ] vsphere[vsphere] filter.go:56 discovering : filtering : removed 0 unmatched vms
[ INFO  ] vsphere[vsphere] filter.go:29 discovering : filtering : filtered 0/2 hosts, 0/3 vms, process took 42.973µs
[ DEBUG ] vsphere[vsphere] metric_lists.go:14 discovering : metric lists : starting resources metric lists collection process
[ INFO  ] vsphere[vsphere] metric_lists.go:30 discovering : metric lists : collected metric lists for 2/2 hosts, 3/3 vms, process took 275.60764ms
[ INFO  ] vsphere[vsphere] discover.go:74 discovering : discovered 2/2 hosts, 3/3 vms, the whole process took 525.614041ms
[ INFO  ] vsphere[vsphere] discover.go:11 starting discovery process, will do discovery every 5m0s
[ DEBUG ] vsphere[vsphere] collect.go:11 starting collection process
[ DEBUG ] vsphere[vsphere] scrape.go:48 scraping : scraped metrics for 2/2 hosts, process took 96.257374ms
[ DEBUG ] vsphere[vsphere] scrape.go:60 scraping : scraped metrics for 3/3 vms, process took 57.879697ms
[ DEBUG ] vsphere[vsphere] collect.go:23 metrics collected, process took 154.77997ms
```

</details>

There you can see that discovering took `525.614041ms`, and collecting metrics took `154.77997ms`. Discovering is a separate thread, it doesn't affect collecting.
`update_every` and `timeout` parameters should be adjusted based on these numbers.



## Metrics

Metrics grouped by *scope*.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.



### Per virtual machine

These metrics refer to the Virtual Machine.

Labels:

| Label      | Description     |
|:-----------|:----------------|
| datacenter | Datacenter name |
| cluster | Cluster name |
| host | Host name |
| vm | Virtual Machine name |

Metrics:

| Metric | Dimensions | Unit |
|:------|:----------|:----|
| vsphere.vm_cpu_utilization | used | percentage |
| vsphere.vm_mem_utilization | used | percentage |
| vsphere.vm_mem_usage | granted, consumed, active, shared | KiB |
| vsphere.vm_mem_swap_usage | swapped | KiB |
| vsphere.vm_mem_swap_io | in, out | KiB/s |
| vsphere.vm_disk_io | read, write | KiB/s |
| vsphere.vm_disk_max_latency | latency | milliseconds |
| vsphere.vm_net_traffic | received, sent | KiB/s |
| vsphere.vm_net_packets | received, sent | packets |
| vsphere.vm_net_drops | received, sent | packets |
| vsphere.vm_overall_status | green, red, yellow, gray | status |
| vsphere.vm_system_uptime | uptime | seconds |

### Per host

These metrics refer to the ESXi host.

Labels:

| Label      | Description     |
|:-----------|:----------------|
| datacenter | Datacenter name |
| cluster | Cluster name |
| host | Host name |

Metrics:

| Metric | Dimensions | Unit |
|:------|:----------|:----|
| vsphere.host_cpu_utilization | used | percentage |
| vsphere.host_mem_utilization | used | percentage |
| vsphere.host_mem_usage | granted, consumed, active, shared, sharedcommon | KiB |
| vsphere.host_mem_swap_io | in, out | KiB/s |
| vsphere.host_disk_io | read, write | KiB/s |
| vsphere.host_disk_max_latency | latency | milliseconds |
| vsphere.host_net_traffic | received, sent | KiB/s |
| vsphere.host_net_packets | received, sent | packets |
| vsphere.host_net_drops | received, sent | packets |
| vsphere.host_net_errors | received, sent | errors |
| vsphere.host_overall_status | green, red, yellow, gray | status |
| vsphere.host_system_uptime | uptime | seconds |



## Alerts


The following alerts are available:

| Alert name  | On metric | Description |
|:------------|:----------|:------------|
| [ vsphere_vm_cpu_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/vsphere.conf) | vsphere.vm_cpu_utilization | Virtual Machine CPU utilization |
| [ vsphere_vm_mem_usage ](https://github.com/netdata/netdata/blob/master/src/health/health.d/vsphere.conf) | vsphere.vm_mem_utilization | Virtual Machine memory utilization |
| [ vsphere_host_cpu_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/vsphere.conf) | vsphere.host_cpu_utilization | ESXi Host CPU utilization |
| [ vsphere_host_mem_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/vsphere.conf) | vsphere.host_mem_utilization | ESXi Host memory utilization |


## Setup

### Prerequisites

No action required.

### Configuration

#### File

The configuration file name for this integration is `go.d/vsphere.conf`.


You can edit the configuration file using the `edit-config` script from the
Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).

```bash
cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/vsphere.conf
```
#### Options

The following options can be defined globally: update_every, autodetection_retry.


<details open><summary>Config options</summary>

| Name | Description | Default | Required |
|:----|:-----------|:-------|:--------:|
| update_every | Data collection frequency. | 20 | no |
| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
| url | vCenter server URL. |  | yes |
| host_include | Hosts selector (filter). |  | no |
| vm_include | Virtual machines selector (filter). |  | no |
| discovery_interval | Hosts and VMs discovery interval. | 300 | no |
| timeout | HTTP request timeout. | 20 | no |
| username | Username for basic HTTP authentication. |  | no |
| password | Password for basic HTTP authentication. |  | no |
| proxy_url | Proxy URL. |  | no |
| proxy_username | Username for proxy basic HTTP authentication. |  | no |
| proxy_password | Password for proxy basic HTTP authentication. |  | no |
| not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no |
| tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no |
| tls_ca | Certification authority that the client uses when verifying the server's certificates. |  | no |
| tls_cert | Client TLS certificate. |  | no |
| tls_key | Client TLS key. |  | no |

##### host_include

Metrics of hosts matching the selector will be collected.

- Include pattern syntax: "/Datacenter pattern/Cluster pattern/Host pattern".
- Match pattern syntax: [simple patterns](/src/libnetdata/simple_pattern/README.md#simple-patterns).
- Syntax:

  ```yaml
  host_include:
    - '/DC1/*'           # select all hosts from datacenter DC1
    - '/DC2/*/!Host2 *'  # select all hosts from datacenter DC2 except HOST2
    - '/DC3/Cluster3/*'  # select all hosts from datacenter DC3 cluster Cluster3
  ```


##### vm_include

Metrics of VMs matching the selector will be collected.

- Include pattern syntax: "/Datacenter pattern/Cluster pattern/Host pattern/VM pattern".
- Match pattern syntax: [simple patterns](/src/libnetdata/simple_pattern/README.md#simple-patterns).
- Syntax:

  ```yaml
  vm_include:
    - '/DC1/*'           # select all VMs from datacenter DC
    - '/DC2/*/*/!VM2 *'  # select all VMs from datacenter DC2 except VM2
    - '/DC3/Cluster3/*'  # select all VMs from datacenter DC3 cluster Cluster3
  ```


</details>

#### Examples

##### Basic

A basic example configuration.

```yaml
jobs:
  - name     : vcenter1
    url      : https://203.0.113.1
    username : admin@vsphere.local
    password : somepassword

```
##### Multi-instance

> **Note**: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.


<details open><summary>Config</summary>

```yaml
jobs:
  - name     : vcenter1
    url      : https://203.0.113.1
    username : admin@vsphere.local
    password : somepassword

  - name     : vcenter2
    url      : https://203.0.113.10
    username : admin@vsphere.local
    password : somepassword

```
</details>



## Troubleshooting

### Debug Mode

**Important**: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the `vsphere` collector, run the `go.d.plugin` with the debug option enabled. The output
should give you clues as to why the collector isn't working.

- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
  your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.

  ```bash
  cd /usr/libexec/netdata/plugins.d/
  ```

- Switch to the `netdata` user.

  ```bash
  sudo -u netdata -s
  ```

- Run the `go.d.plugin` to debug the collector:

  ```bash
  ./go.d.plugin -d -m vsphere
  ```

### Getting Logs

If you're encountering problems with the `vsphere` collector, follow these steps to retrieve logs and identify potential issues:

- **Run the command** specific to your system (systemd, non-systemd, or Docker container).
- **Examine the output** for any warnings or error messages that might indicate issues.  These messages should provide clues about the root cause of the problem.

#### System with systemd

Use the following command to view logs generated since the last Netdata service restart:

```bash
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep vsphere
```

#### System without systemd

Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:

```bash
grep vsphere /var/log/netdata/collector.log
```

**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.

#### Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

```bash
docker logs netdata 2>&1 | grep vsphere
```