Koozali.org: home of the SME Server

Swapped out UPS, server has started randomly shutdown/reboot

Offline BossHog

  • 7
  • +0/-0
Howdy,
My very reliable SME 9.2 server has started rebooting randomly after swapping a dieing UPS for a replacement.
Basic scenario:
1 old Cyberpower UPS was failing
2 replaced it with a APC UPS with new batteries, X900
3 Shutdown server, physically swapped-out the UPS'. Did a reconfigure and reboot to allow server to "see" the new hardware
4 log into CLI to verify that UPS is connected. check!
5 nut shows proper model, battery status(100%), line voltage, low battery shutdown level(5%) etc.
6 all seems normal!
About 2 days later admin mail account shows server shutting down and rebooting @ 3am.
Roughly 4 days later the same thing happens at @ 11am.
One week after this it  shutdown/reboots @ 2pm
What i see in the admin email is 3 warnings:
lowbattery
shutdown
power returns
This all happens within 3 seconds and the server restarts

So my only option at  the moment was to disable nut service. This has stopped the server from randomly rebooting, but obviously I am missing the protection of an safe shutdown during a real power outage.

So if anyone can provide some coaching as to what I may have missed after swapping the UPS out, it would be much appreciated.
FYI, my history dates back to e-smith 4 but I am stumped as it should have just worked:)

Thanks,
Joe

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #1 on: July 12, 2020, 01:19:37 PM »
The default "power sensitivity" level on APC UPSs is "high" - that is, if the AC power gets only a few volts above or below the nominal voltage for your area, the UPS switches to battery power: https://www.apc.com/us/en/faqs/FA165427/

If your wall power is regularly above or below the correct voltage for your location, the batteries will discharge.

Set the UPS for medium or low sensitivity (done on the UPS itself) and see if that solves your problem.

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #2 on: July 12, 2020, 08:29:07 PM »
Hmmm.

If it's sensitivity the you'll here it kicking in and under load and it should show something like on load.

I think this is the bit that needs investigating.

Quote
lowbattery

Is that because it is going on battery due to sensitivity?

Otherwise why is the battery getting drained? Faulty UPS?

You could have a look at installing apcupsd, at least for testing (might be a newer version somewhere).

https://wiki.contribs.org/Apcupsd

...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline BossHog

  • 7
  • +0/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #3 on: July 13, 2020, 02:29:24 AM »
Hey guys,
thanks for the help. Correction, the ups is model Back-UPS XS 900

The battery is not low the nut software thinks it is but it is 100% charged.
I dont use apcupsd, never had a need to. SME doesn't require apcupsd to function unless its for the APC Smart UPS. Nut loads the usbhid-ups and according to APC that is also correct for this ups.
There are no "settings" on the ups itself, i.e sensitivity.
My plan is to enable nut again and I will post the output. I will also "test" the battery, this unit should run SME for between 50-60 minutes when on a full charge. FYI, the batteries are from APC and were manufactured in Jan. 2020. I am leaning towards unlikely that batteries are defective.

[root@sme]# upsc UPS@localhost

Code: [Select]
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50
battery.date: 2001/09/25
battery.mfr.date: 2005/06/20
battery.runtime: 0
battery.runtime.low: 120
battery.temperature: 29.2
battery.type: PbAc
battery.voltage: 27.9
battery.voltage.nominal: 24.0
device.mfr: American Power Conversion
device.model: Back-UPS XS 900
device.serial: QB0526331052 
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: /var/lib/ups/hiddev0
driver.version: 2.6.5
driver.version.data: APC HID 0.95
driver.version.internal: 0.37
input.sensitivity: medium
input.transfer.high: 139
input.transfer.low: 97
input.transfer.reason: input voltage out of range
input.voltage: 120.0
input.voltage.nominal: 120
output.frequency: 60.0
output.voltage: 120.0
output.voltage.nominal: 120.0
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.firmware: 9.o1 .D
ups.firmware.aux: o1
ups.load: 15.0
ups.mfr: American Power Conversion
ups.mfr.date: 2005/06/20
ups.model: Back-UPS XS 900
ups.productid: 0002
ups.realpower.nominal: 540
ups.serial: QB0526331052 
ups.status: OL LB
ups.test.result: No test initiated
ups.timer.reboot: 0
ups.timer.shutdown: -1
ups.timer.start: 0
ups.vendorid: 051d

Some take aways here, it reports incorrect dates for the battery and ups. Small potatoes but noticable
SME loads usbhid-ups for the communications, which I thought was correct.
Will do a battery test on Tuesday if needed as the office will be slower.
Thanks again,
Joe

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #4 on: July 13, 2020, 06:02:22 AM »
Note I said...

Quote
You could have a look at installing apcupsd, at least for testing

Not saying you should use it permanently.

Just to see what it reports as  a comparison so you can eliminate nut as a potential problem.

(You are in the problem elimination business right now. Take away what it can't be, and whatever is left, no matter how insane, is likely the issue....)
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #5 on: July 13, 2020, 12:09:07 PM »
Quote
battery.runtime: 0
battery.runtime is the number of seconds the UPS thinks it can run your load on the existing battery charge.

There is some chance that the calculation for "battery.runtime" factors in the apparent age (19 years) of your batteries - you may need to convince the UPS that it is now July 2020, then remove and reinstall the batteries (or something like that).  Or maybe you really need to convince the UPS that the the current date is a month or two after the "battery.mfr.date".

Offline BossHog

  • 7
  • +0/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #6 on: July 18, 2020, 04:16:57 PM »
Howdy,
*Follow Up*
As promised I did a "Live" test by unplugging UPS from line power to see if the batteries would perform as expected.
NUT has been disabled before running this test.
Results, SME ran for 47 minutes on battery without a hitch. At this point the UPS was reconnected to line power.
Any of my past experiences with this UPS have given me between 45-50 minutes of run time. This will remove the new but faulty battery from my equation.
So let me respond to some of the helpful comments so we can continue forward.

@reetp "You could have a look at installing apcupsd, at least for testing "
Yea, thats not an option this is my production server. The only version of apcupsd available for CentOS 6 base is a third party package. Also,
the How-Tos on contribs.org are very, very outdated. This would pollute the testng environment IMHO. My expectations are that NUT may be capable of working with this APC unit in stock form.

@mmccarn "There is some chance that the calculation for "battery.runtime" factors in the apparent age (19 years)"
Interesting, is there documentation that shows how or where we can set battery date??
When I looked through the NUT docs, there wasn't any mention that the date was used with any of the calculations. It appeared to be a reference line?

My thinking is, the NUT config needs to be told how many minutes the UPS can run, 45 minutes which would be about 2700 seconds, however the SME config for nut doesn't appear to have an option for setting the battery.runtime?

Again, thanks for helping.

Offline ReetP

  • *
  • 3,722
  • +5/-0
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline BossHog

  • 7
  • +0/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #8 on: July 19, 2020, 12:59:40 AM »
Howdy,
@reetp thanks for the link. My UPS is not a SmartUPS but the link got me thinking that the my BackUPS may need to be configured from a windows machine. So..rebooted the laptop to the Win10 install, downloaded the Powerchute software and adjusted the config for the UPS from windows.

This is what I see now:
Code: [Select]
upsc UPS@localhost
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50
battery.date: 2001/09/25
battery.mfr.date: 2020/07/18
battery.runtime: 2827
battery.runtime.low: 120
battery.temperature: 29.2
battery.type: PbAc
battery.voltage: 27.9
battery.voltage.nominal: 24.0
device.mfr: American Power Conversion
device.model: Back-UPS XS 900
device.serial: QB0526331052 
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: /var/lib/ups/hiddev0
driver.version: 2.6.5
driver.version.data: APC HID 0.95
driver.version.internal: 0.37
input.sensitivity: low
input.transfer.high: 139
input.transfer.low: 97
input.voltage: 120.0
input.voltage.nominal: 120
output.frequency: 60.0
output.voltage: 120.0
output.voltage.nominal: 120.0
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.firmware: 9.o1 .D
ups.firmware.aux: o1
ups.load: 16.0
ups.mfr: American Power Conversion
ups.mfr.date: 2005/06/20
ups.model: Back-UPS XS 900
ups.productid: 0002
ups.realpower.nominal: 540
ups.serial: QB0526331052 
ups.status: OL
ups.test.result: No test initiated
ups.timer.reboot: 0
ups.timer.shutdown: -1
ups.timer.start: 0
ups.vendorid: 051d

This is what would be expected for the UPS.
It is however a bit of a bummer that the UPS needed a Winbox so the date and sensitivity could be configured.
Nut has been re-enabled on my server, so its a matter of waiting a couple days to see if it blips.
If all works the thread will be retitled to solved

Thanks for the pointer!
Joe

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #9 on: July 19, 2020, 01:24:53 AM »
Did you look at this?

https://networkupstools.org/docs/user-manual.chunked/ar01s02.html

upsrw
upscmd

I've used them myself to configure bits (but it was why I was thinking of apcupsd so you could inspect/amend settings)

You may well have been able to set it with them. Have a check.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #10 on: July 19, 2020, 12:27:02 PM »
Quote
Code: [Select]
...
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50
...
battery.runtime: 2827
...
I suspect you've solved the problem.

If you don't want to wait for another power hiccup you can test by unplugging the UPS.

The values for 'battery.runtime' and 'battery.charge' should decrease at a linear rate.

You should get a warning (of some sort... somewhere...) when 'battery.charge' gets to 50, and the system should shutdown politely when 'battery.charge' gets to 10.

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #11 on: July 19, 2020, 12:29:07 PM »
Quote
Code: [Select]
...
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50
...
battery.runtime: 2827
...
I suspect you've solved the problem.

If you don't want to wait for another power hiccup you can test by unplugging the UPS.

The values for 'battery.runtime' and 'battery.charge' should decrease at a linear rate.

You should get a warning (of some sort... somewhere...) when 'battery.charge' gets to 50, and the system should shutdown politely when 'battery.charge' gets to 10.

[edit]fix the formatting of the post...

Offline BossHog

  • 7
  • +0/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #12 on: July 21, 2020, 03:16:14 PM »
Howdy,
this Topic is safe to mark as Resolved/Solved as my UPS is performing as expected.
Nut reports the runtime and discharging properly and the random reboots have stopped.
Did you look at this?

https://networkupstools.org/docs/user-manual.chunked/ar01s02.html

upsrw
upscmd

I've used them myself to configure bits (but it was why I was thinking of apcupsd so you could inspect/amend settings)

You may well have been able to set it with them. Have a check.
@reetp after reading the info at the link you posted I also agree that those commands would have done what my forray into the Win10 version of Powerchute accomplished.

Thanks for all the help,
Joe

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: Swapped out UPS, server has started randomly shutdown/reboot
« Reply #13 on: July 21, 2020, 03:19:38 PM »
@reetp after reading the info at the link you posted I also agree that those commands would have done what my forray into the Win10 version of Powerchute accomplished.

Ha. Rule 1. RTFM ;-)

Pleased you got it sorted - interesting one!

Now, please go and test v10. That or you will have nothing to upgrade to when v9 goes EOL in October/Nov......

...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation