Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_md_* does not show RAID syncing #1874

Open
marcinhlybin opened this issue Oct 21, 2020 · 3 comments
Open

node_md_* does not show RAID syncing #1874

marcinhlybin opened this issue Oct 21, 2020 · 3 comments

Comments

@marcinhlybin
Copy link

Host operating system: output of uname -a

Linux barman-01 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.0.0 (branch: HEAD, revision: b9c96706a7425383902b6143d097cf6d7cfd1960)
  build user:       root@3e55cc20ccc0
  build date:       20200526-06:01:48
  go version:       go1.14.3

node_exporter command line flags

Excerpt from the systemd service:

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=10.10.90.1:9100 \
  --collector.diskstats.ignored-devices='^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p)\\d+$' \
  --collector.filesystem.ignored-mount-points='^/(sys|proc|dev|run)($|/)' \
  --collector.netdev.device-blacklist='^lo$' \
  --collector.textfile.directory=/var/lib/prometheus/node_exporter \
  --collector.netstat.fields='(.*)' \
  --collector.vmstat.fields='(.*)' \
  --collector.interrupts \
  --collector.processes \
  --collector.systemd \
  --collector.tcpstat

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

metrics:

root@barman-01 ~ # curl -Ss 10.10.90.1:9100/metrics|grep _md_
# HELP node_md_blocks Total number of blocks on device.
# TYPE node_md_blocks gauge
node_md_blocks{device="md0"} 1.046528e+06
node_md_blocks{device="md1"} 1.9530507264e+10
# HELP node_md_blocks_synced Number of blocks synced on device.
# TYPE node_md_blocks_synced gauge
node_md_blocks_synced{device="md0"} 1.046528e+06
node_md_blocks_synced{device="md1"} 1.9530507264e+10
# HELP node_md_disks Number of active/failed/spare disks of device.
# TYPE node_md_disks gauge
node_md_disks{device="md0",state="active"} 4
node_md_disks{device="md0",state="failed"} 0
node_md_disks{device="md0",state="spare"} 0
node_md_disks{device="md1",state="active"} 4
node_md_disks{device="md1",state="failed"} 0
node_md_disks{device="md1",state="spare"} 0
# HELP node_md_disks_required Total number of disks of device.
# TYPE node_md_disks_required gauge
node_md_disks_required{device="md0"} 4
node_md_disks_required{device="md1"} 4
# HELP node_md_state Indicates the state of md-device.
# TYPE node_md_state gauge
node_md_state{device="md0",state="active"} 1
node_md_state{device="md0",state="inactive"} 0
node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0
node_md_state{device="md1",state="active"} 1
node_md_state{device="md1",state="inactive"} 0
node_md_state{device="md1",state="recovering"} 0
node_md_state{device="md1",state="resync"} 0

mdstat:

root@barman-01 ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid6 sdb2[1] sdc2[2] sdd2[3] sda2[0]
      19530507264 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      [==============>......]  check = 73.4% (7173181200/9765253632) finish=273.0min speed=158203K/sec
      bitmap: 2/73 pages [8KB], 65536KB chunk

md0 : active raid1 sdb1[1] sdc1[2] sdd1[3] sda1[0]
      1046528 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>

What did you expect to see?

I expected to see difference between node_md_blocks and node_md_blocks_synced values. Currently values are the same although /proc/mdstat shows syncing.

node_md_blocks{device="md1"} 1.9530507264e+10
node_md_blocks_synced{device="md1"} 1.9530507264e+10

I expected recovering and resync metrics set to 1:

node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0
@marcinhlybin marcinhlybin changed the title node_md_* does not show recovering data node_md_* does not show RAID syncing Oct 21, 2020
@marcinhlybin
Copy link
Author

I took a look at the mdadm details and it seems that the array is in checking state. I think it would be a good idea to add this state to the metrics.

However when I check sysfs I can see following syncing information. In fully operational state this file says none. I think this value should reflect node_md_blocks_synced metrics:

root@barman-01 ~ # cat /sys/block/md1/md/sync_completed
15295250000 / 19530507264

mdadm details:

root@barman-01 /sys/block/md1/md # mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Tue Apr 28 10:56:42 2020
        Raid Level : raid6
        Array Size : 19530507264 (18625.74 GiB 19999.24 GB)
     Used Dev Size : 9765253632 (9312.87 GiB 9999.62 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Oct 21 13:43:29 2020
             State : active, checking
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

      Check Status : 78% complete

              Name : rescue:1
              UUID : d11ed962:b8848438:41411ae3:2e973bf6
            Events : 110843

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2

@SuperQ
Copy link
Member

SuperQ commented Oct 24, 2020

There's a procfs PR to improve mdadm parsing. (prometheus/procfs#329) But, as @dswarbrick mentioned, we should probably add parsing for the new sysfs files.

@pznamensky
Copy link

pznamensky commented May 19, 2023

Thanks @dswarbrick for implementing sysfs files parsing!
Is there a chance this change will be adopted in node_exporter for the foreseeable future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants