Koozali.org: home of the SME Server

atop show LVM and DSK maxed at 100% or more

Offline Mophilly

  • *
  • 384
  • +0/-0
    • Mophilly
atop show LVM and DSK maxed at 100% or more
« on: July 24, 2019, 01:14:00 AM »
I have been chasing a performance problem. Not much joy in resolving it but I am learning a lot (I hope).
atop reports:
Code: [Select]
LVM |     main-root |  busy    101% |  read       0 |  write    404 |  KiB/r   0  | KiB/w   3  | MBr/s    0.0  | MBw/s    0.2  | avio 24.8 ms  |
MDD |           md1 |  busy 0% |  read   0 |  write    406 |  KiB/r   0  | KiB/w   3  | MBr/s    0.0  | MBw/s    0.2  | avio 0.00 ms  |
DSK |           sdb |  busy    100% |  read       0 |  write     75 |  KiB/r   0  | KiB/w 21  | MBr/s    0.0  | MBw/s    0.2  | avio  133 ms  |
DSK |           sda |  busy      1% |  read       0 |  write     74 |  KiB/r   0  | KiB/w 21  | MBr/s    0.0  | MBw/s    0.2  | avio 0.92 ms  |

I have not encountered this before. Average process cpu is 3%, I see a lot of processes for httpd, php-cgi, qpsmtpd-forkser, fail2ban and postmaster.

I am beginning to think the SSD's are trying to tell me to replace them. However, the short smart test looks OK.

What else should I be looking at?
« Last Edit: July 24, 2019, 01:28:20 AM by Mophilly »
- Mark

Offline Mophilly

  • *
  • 384
  • +0/-0
    • Mophilly
Re: atop show LVM and DSK maxed at 100% or more
« Reply #1 on: July 27, 2019, 01:48:13 AM »
This is a follow up for anyone who may be interested. After a lot of investigation of applications and the configuration, and correcting two problems we found in the process, we noticed a high error rate on one drive in the RAID array. At this point, the working theory is we need to replace the drives in the array; both are four years old. First, replace the one we feel is faulty. After the array is rebuilt and sync'd, replace the remaining one and let it sync.
- Mark

Offline Mophilly

  • *
  • 384
  • +0/-0
    • Mophilly
Re: atop show LVM and DSK maxed at 100% or more
« Reply #2 on: July 27, 2019, 07:54:34 AM »
And the final follow up. We followed the instructions for upgrading the disks on the page https://wiki.contribs.org/Raid. All went well and the move from 240GB SSDs to 500GB SSDs was completed in about 2.5 hours, start to finish.
- Mark