Koozali.org: home of the SME Server

workstation backup fails: "Backup terminated: pre-backup failed - status: 256"

Offline julianop

  • *
  • 61
  • +0/-0
For no reason that is clear to me (read: the change in behavior is not coincident with any configuration change on my part) my workstation backups are failing per the message in the title.
Backups are to an nfs share on a WD "My Cloud EX2 Ultra", and had been working for a couple of months - from which I deduce that my configuration was satisfactory.
There is plenty of space on the share, and a verify of the most recent share completes successfully - from which I deduce that the share is still functioning and available.

In order to trigger a backup for this support request I changed the time on the workstation backup configuration, with results in /var/log/messages as follows:

Jul 28 15:31:42 spencer /etc/e-smith/web/panels/manager/cgi-bin/backup[1505]: /home/e-smith/db/configuration: OLD backupwk=service|BackupTime|15:29|Compression|6|CompressionProg|gzip|DaysInSet|7|FullDay|0|IncNum|1|IncOnlyTimeout|yes|Login||Mount|/mnt/smb|Password||Program|dar|SetNum|2|SetsMax|2|SmbHost|192.168.50.38|SmbShare|nfs/spencer-backup|Timeout|8|VFSType|nfs|status|enabled
Jul 28 15:31:42 spencer /etc/e-smith/web/panels/manager/cgi-bin/backup[1505]: /home/e-smith/db/configuration: NEW backupwk=service|BackupTime|15:40|Compression|6|CompressionProg|gzip|DaysInSet|7|FullDay|0|IncNum|1|IncOnlyTimeout|yes|Login||Mount|/mnt/smb|Password||Program|dar|SetNum|2|SetsMax|2|SmbHost|192.168.50.38|SmbShare|nfs/spencer-backup|Timeout|8|VFSType|nfs|status|enabled
Jul 28 15:31:42 spencer esmith::event[1506]: Processing event: conf-backup
Jul 28 15:31:42 spencer esmith::event[1506]: Running event handler: /etc/e-smith/events/actions/generic_template_expand
Jul 28 15:31:42 spencer esmith::event[1506]: expanding /etc/flexbackup.conf
Jul 28 15:31:42 spencer esmith::event[1506]: expanding /etc/crontab
Jul 28 15:31:42 spencer esmith::event[1506]: expanding /etc/dar/CIFScredentials
Jul 28 15:31:42 spencer esmith::event[1506]: generic_template_expand=action|Event|conf-backup|Action|generic_template_expand|Start|1564345902 782541|End|1564345902 919067|Elapsed|0.136526
(END)

Results are as follows:
Jul 28 15:40:01 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: OLD 1564346401=(undefined)
Jul 28 15:40:01 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: NEW 1564346401=backup_record
Jul 28 15:40:01 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: OLD 1564346401=backup_record
Jul 28 15:40:01 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: NEW 1564346401=backup_record|StartEpochTime|1564346401
Jul 28 15:40:01 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: OLD 1564346401=backup_record|StartEpochTime|1564346401
Jul 28 15:40:01 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: NEW 1564346401=backup_record|BackupType|workstation|StartEpochTime|1564346401
Jul 28 15:40:01 spencer esmith::event[1650]: Processing event: pre-backup
Jul 28 15:40:01 spencer esmith::event[1650]: Running event handler: /etc/e-smith/events/actions/generic_template_expand
Jul 28 15:40:01 spencer esmith::event[1650]: expanding /etc/dar/DailyBackup.dcf
Jul 28 15:40:01 spencer esmith::event[1650]: generic_template_expand=action|Event|pre-backup|Action|generic_template_expand|Start|1564346401 636385|End|1564346401 774136|Elapsed|0.137751
Jul 28 15:40:01 spencer esmith::event[1650]: Running event handler: /etc/e-smith/events/pre-backup/S10mysql-delete-dumped-tables
Jul 28 15:40:01 spencer esmith::event[1650]: S10mysql-delete-dumped-tables=action|Event|pre-backup|Action|S10mysql-delete-dumped-tables|Start|1564346401 774402|End|1564346401 777765|Elapsed|0.003363
Jul 28 15:40:01 spencer esmith::event[1650]: Running event handler: /etc/e-smith/events/pre-backup/S20mysql-dump-tables
Jul 28 15:40:02 spencer esmith::event[1650]: S20mysql-dump-tables=action|Event|pre-backup|Action|S20mysql-dump-tables|Start|1564346401 778093|End|1564346402 765561|Elapsed|0.987468
Jul 28 15:40:02 spencer esmith::event[1650]: Running event handler: /etc/e-smith/events/pre-backup/S30ldap-dump
Jul 28 15:40:02 spencer esmith::event[1650]: 5d3e0822 bdb_db_open: database "dc=barnlea,dc=com": unclean shutdown detected; attempting recovery.
Jul 28 15:40:02 spencer esmith::event[1650]: 5d3e0822 bdb_db_open: database "dc=barnlea,dc=com": recovery skipped in read-only mode. Run manual recovery if errors are encountered.
Jul 28 15:40:02 spencer esmith::event[1650]: 5d3e0822 bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
Jul 28 15:40:02 spencer esmith::event[1650]: S30ldap-dump=action|Event|pre-backup|Action|S30ldap-dump|Start|1564346402 765890|End|1564346402 834486|Elapsed|0.068596|Status|11
Jul 28 15:40:02 spencer kernel: slapcat[1673]: segfault at 8 ip 00005570554fdf8a sp 00007fff5146fba0 error 4 in slapd[5570554a7000+200000]
Jul 28 15:40:02 spencer esmith::event[1650]: Running event handler: /etc/e-smith/events/pre-backup/S50rewind-tape
Jul 28 15:40:02 spencer esmith::event[1650]: S50rewind-tape=action|Event|pre-backup|Action|S50rewind-tape|Start|1564346402 834771|End|1564346402 878213|Elapsed|0.043442
Jul 28 15:40:02 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: OLD 1564346401=backup_record|BackupType|workstation|StartEpochTime|1564346401
Jul 28 15:40:02 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: NEW 1564346401=backup_record|BackupType|workstation|EndEpochTime|1564346402|StartEpochTime|1564346401
Jul 28 15:40:02 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: OLD 1564346401=backup_record|BackupType|workstation|EndEpochTime|1564346402|StartEpochTime|1564346401
Jul 28 15:40:02 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: NEW 1564346401=backup_record|BackupType|workstation|EndEpochTime|1564346402|Result|pre-backup:256|StartEpochTime|1564346401
(END)

My guess is that the canary is here:
Jul 28 15:40:02 spencer kernel: slapcat[1673]: segfault at 8 ip 00005570554fdf8a sp 00007fff5146fba0 error 4 in slapd[5570554a7000+200000]

But I'm not competent to go any further.
Could someone help, please?

Offline TerryF

  • grumpy old man
  • *
  • 1,826
  • +6/-0
Jul 28 15:40:02 spencer /sbin/e-smith/do_backupwk[1634]: /home/e-smith/db/backups: NEW 1564346401=backup_record|BackupType|workstation|EndEpochTime|1564346402|Result|pre-backup:256|StartEpochTime|1564346401
(END)

Note in above : |Result|pre-backup:256|

See this post re a corrupt db preventing the pre-backup event - https://forums.contribs.org/index.php/topic,53775.0.html

See suggested solution..
--
qui scribit bis legit

Offline julianop

  • *
  • 61
  • +0/-0
Note in above : |Result|pre-backup:256|

See this post re a corrupt db preventing the pre-backup event - https://forums.contribs.org/index.php/topic,53775.0.html

See suggested solution..

Hi TerryF,
Yes, I'd already gone through that thread with much hope before posting, and was disappointed that those steps did nothing for me: mysqlcheck shows no errors. I ran the -r to repair, just in case, but nothing changes.


Offline mmccarn

  • *
  • 2,627
  • +10/-0
From your log extract I think your corrupt database is related to OpenLDAP instead of MySQL:
Quote
Jul 28 15:40:02 spencer esmith::event[1650]: Running event handler: /etc/e-smith/events/pre-backup/S30ldap-dump
Jul 28 15:40:02 spencer esmith::event[1650]: 5d3e0822 bdb_db_open: database "dc=barnlea,dc=com": unclean shutdown detected; attempting recovery.
Jul 28 15:40:02 spencer esmith::event[1650]: 5d3e0822 bdb_db_open: database "dc=barnlea,dc=com": recovery skipped in read-only mode. Run manual recovery if errors are encountered.
Jul 28 15:40:02 spencer esmith::event[1650]: 5d3e0822 bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
Jul 28 15:40:02 spencer esmith::event[1650]: S30ldap-dump=action|Event|pre-backup|Action|S30ldap-dump|Start|1564346402 765890|End|1564346402 834486|Elapsed|0.068596|Status|11
Jul 28 15:40:02 spencer kernel: slapcat[1673]: segfault at 8 ip 00005570554fdf8a sp 00007fff5146fba0 error 4 in slapd[5570554a7000+200000]

Here is the official OpenLDAP page on recovering a damaged database: https://linuxacademy.com/blog/linux/openldap-fixing-or-recovering-a-corrupt-directory/
(Note that the openldap recovery tool is named "db_recover" on SME 9.2, and not "slapd_db_recover" as listed here)

Here is a bug from late 2018 that touches on ldap corruption and backup/restore: https://bugs.contribs.org/show_bug.cgi?id=10371

Offline TerryF

  • grumpy old man
  • *
  • 1,826
  • +6/-0
Yes, was unsure whether to dig about there, but looks the goods
--
qui scribit bis legit

Offline julianop

  • *
  • 61
  • +0/-0
From your log extract I think your corrupt database is related to OpenLDAP instead of MySQL:
Here is the official OpenLDAP page on recovering a damaged database: https://linuxacademy.com/blog/linux/openldap-fixing-or-recovering-a-corrupt-directory/
(Note that the openldap recovery tool is named "db_recover" on SME 9.2, and not "slapd_db_recover" as listed here)

Here is a bug from late 2018 that touches on ldap corruption and backup/restore: https://bugs.contribs.org/show_bug.cgi?id=10371

Sorry for delay - was out of town for a couple of days. Thanks for this, I was vaguely conscious of that disconnect, but didn't know where to go with it. I'll work this path and report.

Offline julianop

  • *
  • 61
  • +0/-0
I fixed the problem using instructions from this page:
https://linuxacademy.com/blog/linux/openldap-fixing-or-recovering-a-corrupt-directory

I had to go all the way down the rabbit hole and use the deepest recovery instructions, but my backups are up and running.

Thanks to all here for support and clues.

Offline janet

  • ****
  • 4,812
  • +0/-0
julianop

Just wondering about the original causative factors.

Do you have a UPS connected to the SME server ?
If so is it configured to shutdown the server gracefully in the event of a prolonged power failure ?

Otherwise were there any instances of someone forcibly shutting down the server using the power switch or removing the power lead or similar ?
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline julianop

  • *
  • 61
  • +0/-0
julianop

Just wondering about the original causative factors.

Do you have a UPS connected to the SME server ?
If so is it configured to shutdown the server gracefully in the event of a prolonged power failure ?

Otherwise were there any instances of someone forcibly shutting down the server using the power switch or removing the power lead or similar ?

Thanks for asking, Janet,

Shame on me, but I had recently moved the server, and hadn't relocated the UPS with it at that time. Regretfully I still don't have a graceful shutdown trigger in place. I guess I deserve everything I get...

That egregious and shamefully irresponsible behavior notwithstanding, I don't believe there was any power interruption at that time: it would have been detected and I would have been told about it because a couple of other devices in the house are actually more sensitive than the server. That is not a definitive response, however.

The only thing of note that occurred that I'm aware of is that one of my sons was making adjustments to his server account directories and Thunderbird email client configuration, and he may not have done everything by the book.  That said, I can't immediately think of anything he could have done that would have induced corruption. I'm going to dig around for any other evidence next weekend, and if I find anything I'll report.

Actually, while I write this, I notice that while the backup appears to have completed successfully, I'm still getting periodic (every five minutes, it seems) bdb errors: "PANIC: fatal region error detected; run recovery" (following the guidance from the provided link, I had restarted ldap in debug mode); I guess I'm not done yet.

On a related subject, i owe you an update regarding my rejected .eu, .xyz domain problem. I have not solved it but have at least localized it to the default Thunderbird profile. I had created a second one with the same account info, and that one does not exhibit the behavior. I'll find that thread and report more there.

Thanks for your engagement.
« Last Edit: August 04, 2019, 01:35:10 AM by julianop »