node_md_* does not show RAID syncing #1874

marcinhlybin · 2020-10-21T11:01:56Z

Host operating system: output of `uname -a`

Linux barman-01 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 1.0.0 (branch: HEAD, revision: b9c96706a7425383902b6143d097cf6d7cfd1960)
  build user:       root@3e55cc20ccc0
  build date:       20200526-06:01:48
  go version:       go1.14.3

node_exporter command line flags

Excerpt from the systemd service:

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=10.10.90.1:9100 \
  --collector.diskstats.ignored-devices='^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p)\\d+$' \
  --collector.filesystem.ignored-mount-points='^/(sys|proc|dev|run)($|/)' \
  --collector.netdev.device-blacklist='^lo$' \
  --collector.textfile.directory=/var/lib/prometheus/node_exporter \
  --collector.netstat.fields='(.*)' \
  --collector.vmstat.fields='(.*)' \
  --collector.interrupts \
  --collector.processes \
  --collector.systemd \
  --collector.tcpstat

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

metrics:

root@barman-01 ~ # curl -Ss 10.10.90.1:9100/metrics|grep _md_
# HELP node_md_blocks Total number of blocks on device.
# TYPE node_md_blocks gauge
node_md_blocks{device="md0"} 1.046528e+06
node_md_blocks{device="md1"} 1.9530507264e+10
# HELP node_md_blocks_synced Number of blocks synced on device.
# TYPE node_md_blocks_synced gauge
node_md_blocks_synced{device="md0"} 1.046528e+06
node_md_blocks_synced{device="md1"} 1.9530507264e+10
# HELP node_md_disks Number of active/failed/spare disks of device.
# TYPE node_md_disks gauge
node_md_disks{device="md0",state="active"} 4
node_md_disks{device="md0",state="failed"} 0
node_md_disks{device="md0",state="spare"} 0
node_md_disks{device="md1",state="active"} 4
node_md_disks{device="md1",state="failed"} 0
node_md_disks{device="md1",state="spare"} 0
# HELP node_md_disks_required Total number of disks of device.
# TYPE node_md_disks_required gauge
node_md_disks_required{device="md0"} 4
node_md_disks_required{device="md1"} 4
# HELP node_md_state Indicates the state of md-device.
# TYPE node_md_state gauge
node_md_state{device="md0",state="active"} 1
node_md_state{device="md0",state="inactive"} 0
node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0
node_md_state{device="md1",state="active"} 1
node_md_state{device="md1",state="inactive"} 0
node_md_state{device="md1",state="recovering"} 0
node_md_state{device="md1",state="resync"} 0

mdstat:

root@barman-01 ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid6 sdb2[1] sdc2[2] sdd2[3] sda2[0]
      19530507264 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      [==============>......]  check = 73.4% (7173181200/9765253632) finish=273.0min speed=158203K/sec
      bitmap: 2/73 pages [8KB], 65536KB chunk

md0 : active raid1 sdb1[1] sdc1[2] sdd1[3] sda1[0]
      1046528 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>

What did you expect to see?

I expected to see difference between node_md_blocks and node_md_blocks_synced values. Currently values are the same although /proc/mdstat shows syncing.

node_md_blocks{device="md1"} 1.9530507264e+10
node_md_blocks_synced{device="md1"} 1.9530507264e+10

I expected recovering and resync metrics set to 1:

node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0

The text was updated successfully, but these errors were encountered:

marcinhlybin · 2020-10-21T11:48:12Z

I took a look at the mdadm details and it seems that the array is in checking state. I think it would be a good idea to add this state to the metrics.

However when I check sysfs I can see following syncing information. In fully operational state this file says none. I think this value should reflect node_md_blocks_synced metrics:

root@barman-01 ~ # cat /sys/block/md1/md/sync_completed
15295250000 / 19530507264

mdadm details:

root@barman-01 /sys/block/md1/md # mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Tue Apr 28 10:56:42 2020
        Raid Level : raid6
        Array Size : 19530507264 (18625.74 GiB 19999.24 GB)
     Used Dev Size : 9765253632 (9312.87 GiB 9999.62 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Oct 21 13:43:29 2020
             State : active, checking
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

      Check Status : 78% complete

              Name : rescue:1
              UUID : d11ed962:b8848438:41411ae3:2e973bf6
            Events : 110843

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2

SuperQ · 2020-10-24T16:55:58Z

There's a procfs PR to improve mdadm parsing. (prometheus/procfs#329) But, as @dswarbrick mentioned, we should probably add parsing for the new sysfs files.

pznamensky · 2023-05-19T14:13:48Z

Thanks @dswarbrick for implementing sysfs files parsing!
Is there a chance this change will be adopted in node_exporter for the foreseeable future?

marcinhlybin changed the title ~~node_md_* does not show recovering data~~ node_md_* does not show RAID syncing Oct 21, 2020

dswarbrick mentioned this issue Apr 24, 2023

Implement mdraid sysfs parsing prometheus/procfs#509

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_md_* does not show RAID syncing #1874

node_md_* does not show RAID syncing #1874

marcinhlybin commented Oct 21, 2020

marcinhlybin commented Oct 21, 2020

SuperQ commented Oct 24, 2020

pznamensky commented May 19, 2023 •

edited

Loading

node_md_* does not show RAID syncing #1874

node_md_* does not show RAID syncing #1874

Comments

marcinhlybin commented Oct 21, 2020

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

marcinhlybin commented Oct 21, 2020

SuperQ commented Oct 24, 2020

pznamensky commented May 19, 2023 • edited Loading

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`

pznamensky commented May 19, 2023 •

edited

Loading