Koozali.org: home of the SME Server

Partial backup stored on backup workstation

Offline warren

  • *
  • 293
  • +0/-0
Partial backup stored on backup workstation
« on: February 06, 2017, 12:07:33 AM »
Hi All,
the tldr of this issue is :
a) How to Force a full backup to continue running until completed ( following this email from cron )? ;
b) Is this an issue in not understanding how the "Backup with dar " actually works ?
as the situation basically leaves one without a "valid backup"?

This is a long post ( and yes, i have opened a bug ( https://bugs.contribs.org/show_bug.cgi?id=9914 ) but I think its important to understand if this is really an issue.

Quote
Received signal: Quit
Archive delayed termination engaged
Disabling signal handler, the next time this signal is received the program will abort immediately
Partial backup stored on backup workstation.
Session cleanly closed by timeout after 88500 seconds.
Not an error, backup process will continue next night.
Final memory cleanup...
Program has been aborted for the following reason: Thread cancellation requested, aborting as properly as possible
Error while running dar: 4 at /etc/e-smith/events/actions/workstation-backup-dar line 234.
Backup terminated: backup failed - status: 7424

So here's the history of the notes I have made regarding what I could find out re  the above :

Ref Search :
-------------
SME 9 : = https://forums.contribs.org/index.php/topic,52528.msg270030.html#msg270030

SME 8 : = https://forums.contribs.org/index.php/topic,51423.msg261275.html#msg261275

SME 8: = https://forums.contribs.org/index.php/topic,50983.0/all.html     ( More about backup to samba share than usb )

Bug Report on this issue : https://bugs.contribs.org/show_bug.cgi?id=9914

Server info :
Version      : SME 9.1
Server Mode   : servergateway
RAM               : 16GB
CPU model name   : Intel(R) Pentium(R) CPU G840 @ 2.80GHz

Amount of Data to backup : 1TB

Current Backup to Workstation settings: From Server Manager
Quote
Configure Workstation Backup

This panel displays the present workstation backup configuration. You can change it in this panel and the next one.

Backup is Enabled.
Backup is made on local removable disk media/usbBackup2
Number of rotating backup sets is 1
Number of daily backups contained in each set is 31
Compression level (0-9) of backup is 9
Daily backup occurs at 18:30
Each daily backup session is cleanly timed out after 24 hours.except full backups which are cleanly timed out after 24 hours
Full backup session (new backup sets) is allowed only on Friday
Create or modify workstation backup configuration

History / Problem :
Backups done to 2 (4TB ) rotating usb backup drives ( labeled usbBackup ; usbBackup2 ) with one drive removed off site.

The previous set of backups ( to disk label : usbBackup )was set to run 24 incrementals that completed on Thurs. Email below :..
Quote
Backup of obfusicated_servername.yourdomain.com started at Thu Feb  2 18:30:20 2017
Destination //localhost/media/usbBackup2/obfusicated_servername.yourdomain.com/set1
Basename inc-024-20170202183020
Starting the backup with a timeout of 23 hours
--------------------------------------------
 27767 inode(s) saved
   including 20949 hard link(s) treated
 0 inode(s) changed at the moment of the backup and could not be saved properly
 0 byte(s) have been wasted in the archive to resave changing files
 1302178 inode(s) not saved (no inode/file change)
 0 inode(s) failed to be saved (filesystem error)
 324 inode(s) ignored (excluded by filters)
 14398 inode(s) recorded as deleted from reference backup
 --------------------------------------------
 Total number of inode(s) considered: 1344667
 --------------------------------------------
 EA saved for 21 inode(s)
 --------------------------------------------
Destination disk usage 2.0T, 59% full, 1.5T available
Backup successfully terminated at Thu Feb  2 22:56:10 2017

The next usb disk ( usbBackup2 ) was plugged in on Fri day morning ( preparation for a full backup run over weekend ) and is prepared for a full backup, but the number of

increments has been changed to 31;
The Disk has the previous backup manually removed by running  ( rm -f -R /media/usbBackup2/obfusicated_servername.yourdomain.com/ )

log snip...
Quote
Feb  3 12:24:53 netkeeper /etc/e-smith/web/panels/manager/cgi-bin/backup[30931]: /home/e-smith/db/configuration: OLD backupwk=service|BackupTime|18:30|Compres
sion|9|CompressionProg|gzip|DaysInSet|24|FullDay|5|IncNum|24|IncOnlyTimeout|yes|Login||Mount|/mnt/smb|Password||Program|dar|SetNum|1|SetsMax|1|SmbHost|localho
st|SmbShare|media/usbBackup2|Timeout|24|VFSType|usb|status|enabled
Feb  3 12:24:53 netkeeper /etc/e-smith/web/panels/manager/cgi-bin/backup[30931]: /home/e-smith/db/configuration: NEW backupwk=service|BackupTime|18:30|Compres
sion|9|CompressionProg|gzip|DaysInSet|31|FullDay|5|IncNum|24|IncOnlyTimeout|yes|Login||Mount|/mnt/smb|Password||Program|dar|SetNum|1|SetsMax|1|SmbHost|localho
st|SmbShare|media/usbBackup2|Timeout|24|VFSType|usb|status|enabled
Feb  3 12:24:53 netkeeper esmith::event[30932]: Processing event: conf-backup
Feb  3 12:24:53 netkeeper esmith::event[30932]: Running event handler: /etc/e-smith/events/actions/generic_template_expand
Feb  3 12:24:53 netkeeper esmith::event[30932]: expanding /etc/crontab
Feb  3 12:24:53 netkeeper esmith::event[30932]: expanding /etc/flexbackup.conf
Feb  3 12:24:54 netkeeper esmith::event[30932]: expanding /etc/dar/CIFScredentials
Feb  3 12:24:54 netkeeper esmith::event[30932]: generic_template_expand=action|Event|conf-backup|Action|generic_template_expand|Start|1486117493 677392|End|14
86117494 112789|Elapsed|0.435397
...

Code: [Select]
config show backupwk
Quote
backupwk=service
    BackupTime=18:30
    Compression=9
    CompressionProg=gzip
    DaysInSet=31
    FullDay=5
    IncNum=24
    IncOnlyTimeout=yes
    Login=
    Mount=/mnt/smb
    Password=
    Program=dar
    SetNum=1
    SetsMax=1
    SmbHost=localhost
    SmbShare=media/usbBackup2
    Timeout=24
    VFSType=usb
    status=enabled

The Backup starts on Friday night @ 18h30 :
Log snip....

Quote
Feb  3 18:30:01 netkeeper /sbin/e-smith/do_backupwk[23277]: /home/e-smith/db/backups: OLD 1486139401=(undefined)
Feb  3 18:30:01 netkeeper /sbin/e-smith/do_backupwk[23277]: /home/e-smith/db/backups: NEW 1486139401=backup_record
Feb  3 18:30:01 netkeeper /sbin/e-smith/do_backupwk[23277]: /home/e-smith/db/backups: OLD 1486139401=backup_record
Feb  3 18:30:01 netkeeper /sbin/e-smith/do_backupwk[23277]: /home/e-smith/db/backups: NEW 1486139401=backup_record|StartEpochTime|1486139401
Feb  3 18:30:01 netkeeper /sbin/e-smith/do_backupwk[23277]: /home/e-smith/db/backups: OLD 1486139401=backup_record|StartEpochTime|1486139401
Feb  3 18:30:01 netkeeper /sbin/e-smith/do_backupwk[23277]: /home/e-smith/db/backups: NEW 1486139401=backup_record|BackupType|workstation|StartEpochTime|14861
39401
Feb  3 18:30:02 netkeeper esmith::event[23280]: Processing event: pre-backup
Feb  3 18:30:02 netkeeper esmith::event[23280]: Running event handler: /etc/e-smith/events/actions/generic_template_expand
Feb  3 18:30:02 netkeeper esmith::event[23280]: expanding /etc/dar/DailyBackup.dcf
Feb  3 18:30:02 netkeeper esmith::event[23280]: generic_template_expand=action|Event|pre-backup|Action|generic_template_expand|Start|1486139402 98765|End|1486
139402 311115|Elapsed|0.21235
Feb  3 18:30:02 netkeeper esmith::event[23280]: Running event handler: /etc/e-smith/events/pre-backup/S10mysql-delete-dumped-tables
Feb  3 18:30:02 netkeeper esmith::event[23280]: S10mysql-delete-dumped-tables=action|Event|pre-backup|Action|S10mysql-delete-dumped-tables|Start|1486139402 31
1355|End|1486139402 313254|Elapsed|0.001899
Feb  3 18:30:02 netkeeper esmith::event[23280]: Running event handler: /etc/e-smith/events/pre-backup/S20mysql-dump-tables
Feb  3 18:30:20 netkeeper esmith::event[23280]: S20mysql-dump-tables=action|Event|pre-backup|Action|S20mysql-dump-tables|Start|1486139402 313476|End|148613942
0 648390|Elapsed|18.334914
Feb  3 18:30:20 netkeeper esmith::event[23280]: Running event handler: /etc/e-smith/events/pre-backup/S30ldap-dump
Feb  3 18:30:20 netkeeper esmith::event[23280]: 5894b01c bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
Feb  3 18:30:20 netkeeper esmith::event[23280]: S30ldap-dump=action|Event|pre-backup|Action|S30ldap-dump|Start|1486139420 648663|End|1486139420 981283|Elapsed
|0.33262
Feb  3 18:30:20 netkeeper esmith::event[23280]: Running event handler: /etc/e-smith/events/pre-backup/S50rewind-tape
Feb  3 18:30:21 netkeeper esmith::event[23280]: S50rewind-tape=action|Event|pre-backup|Action|S50rewind-tape|Start|1486139420 981514|End|1486139421 26709|Elap
sed|0.045195

...

Saturday Email 1 shows error ( Because full backup was started on Friday and is still running ):
Quote
Received signal: Quit
Archive delayed termination engaged
Disabling signal handler, the next time this signal is received the program will abort immediately
Partial backup stored on backup workstation.
Session cleanly closed by timeout after 88500 seconds.
Not an error, backup process will continue next night.
Final memory cleanup...
Program has been aborted for the following reason: Thread cancellation requested, aborting as properly as possible
Error while running dar: 4 at /etc/e-smith/events/actions/workstation-backup-dar line 234.
Backup terminated: backup failed - status: 7424

Quote
==================================
DAILY BACKUP TO WORKSTATION REPORT
==================================
Backup of obfusicated_servername.yourdomain.com started at Fri Feb  3 18:30:21 2017
Destination //localhost/media/usbBackup2/obfusicated_servername.yourdomain.com/set1
Basename full-20170203183021
Starting the backup with a timeout of 24 hours
*** No backup allowed or error during backup ***
Error while running dar: 4

Saturday Email 2 shows email:....
Quote
==================================
DAILY BACKUP TO WORKSTATION REPORT
==================================
Backup of obfusicated_servername.yourdomain.com started at Fri Feb  3 18:30:21 2017
Destination //localhost/media/usbBackup2/obfusicated_servername.yourdomain.com/set1
Basename full-20170203183021
Starting the backup with a timeout of 24 hours
*** No backup allowed or error during backup ***
Error while running dar: 4

and the log snip ...
Quote
Feb  5 18:30:01 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: OLD 1486312201=(undefined)
Feb  5 18:30:01 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: NEW 1486312201=backup_record
Feb  5 18:30:01 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: OLD 1486312201=backup_record
Feb  5 18:30:01 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: NEW 1486312201=backup_record|StartEpochTime|1486312201
Feb  5 18:30:01 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: OLD 1486312201=backup_record|StartEpochTime|1486312201
Feb  5 18:30:01 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: NEW 1486312201=backup_record|BackupType|workstation|StartEpochTime|14863
12201
Feb  5 18:30:01 netkeeper esmith::event[20347]: Processing event: pre-backup
Feb  5 18:30:01 netkeeper esmith::event[20347]: Running event handler: /etc/e-smith/events/actions/generic_template_expand
Feb  5 18:30:01 netkeeper esmith::event[20347]: expanding /etc/dar/DailyBackup.dcf
Feb  5 18:30:02 netkeeper esmith::event[20347]: generic_template_expand=action|Event|pre-backup|Action|generic_template_expand|Start|1486312201 898833|End|148
6312202 337136|Elapsed|0.438303
Feb  5 18:30:02 netkeeper esmith::event[20347]: Running event handler: /etc/e-smith/events/pre-backup/S10mysql-delete-dumped-tables
Feb  5 18:30:02 netkeeper esmith::event[20347]: S10mysql-delete-dumped-tables=action|Event|pre-backup|Action|S10mysql-delete-dumped-tables|Start|1486312202 33
7370|End|1486312202 379446|Elapsed|0.042076
Feb  5 18:30:02 netkeeper esmith::event[20347]: Running event handler: /etc/e-smith/events/pre-backup/S20mysql-dump-tables
Feb  5 18:30:24 netkeeper esmith::event[20347]: S20mysql-dump-tables=action|Event|pre-backup|Action|S20mysql-dump-tables|Start|1486312202 379685|End|148631222
4 750821|Elapsed|22.371136
Feb  5 18:30:24 netkeeper esmith::event[20347]: Running event handler: /etc/e-smith/events/pre-backup/S30ldap-dump
Feb  5 18:30:25 netkeeper esmith::event[20347]: 58975321 bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
Feb  5 18:30:25 netkeeper esmith::event[20347]: S30ldap-dump=action|Event|pre-backup|Action|S30ldap-dump|Start|1486312224 751087|End|1486312225 749036|Elapsed
|0.997949
Feb  5 18:30:25 netkeeper esmith::event[20347]: Running event handler: /etc/e-smith/events/pre-backup/S50rewind-tape
Feb  5 18:30:25 netkeeper esmith::event[20347]: S50rewind-tape=action|Event|pre-backup|Action|S50rewind-tape|Start|1486312225 749280|End|1486312225 810426|Ela
psed|0.061146
Feb  5 18:30:26 netkeeper kernel: kjournald starting.  Commit interval 5 seconds
Feb  5 18:30:26 netkeeper kernel: EXT3-fs (sde1): using internal journal
Feb  5 18:30:26 netkeeper kernel: EXT3-fs (sde1): mounted filesystem with ordered data mode
Feb  5 18:30:27 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: OLD 1486312201=backup_record|BackupType|workstation|StartEpochTime|14863
12201
Feb  5 18:30:27 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: NEW 1486312201=backup_record|BackupType|workstation|EndEpochTime|1486312
227|StartEpochTime|1486312201
Feb  5 18:30:27 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: OLD 1486312201=backup_record|BackupType|workstation|EndEpochTime|1486312
227|StartEpochTime|1486312201
Feb  5 18:30:27 netkeeper /sbin/e-smith/do_backupwk[20343]: /home/e-smith/db/backups: NEW 1486312201=backup_record|BackupType|workstation|EndEpochTime|1486312
227|Result|backup:7424|StartEpochTime|1486312201

Sunday The backup fails ( we are now left with situatin of not having a valid backup !!!!!!!!!! )
email :......

Quote
Not a permitted day for full backup. Aborting...
Next full backup in 5 days.
Backup terminated: backup failed - status: 7424

So how do we / I correct this ?


 
« Last Edit: February 06, 2017, 12:14:33 AM by warren »

Offline warren

  • *
  • 293
  • +0/-0
Re: Partial backup stored on backup workstation
« Reply #1 on: May 11, 2017, 10:10:04 PM »
Bump,

Any views on this ? or method of running the backup and forcing it to continue running until the initial full backup has completed.

Offline Stefano

  • *
  • 10,839
  • +2/-0
Re: Partial backup stored on backup workstation
« Reply #2 on: May 11, 2017, 10:40:01 PM »
if your dar-catalog is corrupted, you have to remove all backups and start again

or, if you prefer, take a look here: https://bugs.contribs.org/show_bug.cgi?id=9159

Offline TerryF

  • grumpy old man
  • *
  • 1,826
  • +6/-0
Re: Partial backup stored on backup workstation
« Reply #3 on: May 12, 2017, 03:12:33 AM »
Bump,
Any views on this ? or method of running the backup and forcing it to continue running until the initial full backup has completed.

You don't have to delete it, just rename the directory created by the backup, it will start a new one..

Remove full backup only on specific day, reset after full finishes .......

Then from cli # /sbin/e-smith/do_backupwk

Inc cant run until there is a completed Full

Messy, but has worked for me
--
qui scribit bis legit

Offline warren

  • *
  • 293
  • +0/-0
Re: Partial backup stored on backup workstation
« Reply #4 on: May 12, 2017, 10:41:45 AM »
Terry , Stefano Thanks for the inputs.

I've got +- 1.8TB of data to backup going to a 6TB external usb. I think Stefano identified the issue here : https://forums.contribs.org/index.php/topic,51423.msg261279.html#msg261279
Quote
IMO we should change backup behaviour.. I mean.. a full backup with hundreds/thousands GB of data need many hours, days to complete.. backup script should start and continue till the end.. it should create a lockfile and check for the existence of it at start.. if the lockfile exists then another backup instance is running.. abort and send an informative email to root.. when backup is done the lockfile is removed..

As data sets get larger, the time needed to complete a full backup increases, so a full backup that starts on a Friday night is going to need at least 2 full days to complete. The current behaviour of Dar to cleanly timeout after 24hrs and give the .."Partial backup stored on backup workstation" ..email message seems to be problematic.

What i'm going to do ( start from Scratch ) is to prune what been backed up in the "ibays" in order to get the full backup to complete within 24 hrs. Then go and remove the --prunes of the larger ibays once i have a full backup completed and let the incrementals run to add the files from the larger ibays.

Offline Jean-Philippe Pialasse

  • *
  • 2,762
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: Partial backup stored on backup workstation
« Reply #5 on: July 12, 2017, 03:06:48 AM »
please see https://bugs.contribs.org/show_bug.cgi?id=9127

you might be able to give feed back or to help to test.

Offline warren

  • *
  • 293
  • +0/-0
Re: Partial backup stored on backup workstation
« Reply #6 on: July 22, 2017, 09:23:33 PM »
I still don not see that this addresses the issue, which is that there should be a way to ensure that the backup programme  :
Quote
a full backup with hundreds/thousands GB of data need many hours, days to complete.. backup script should start and continue till the end.. it should create a lockfile and check for the existence of it at start.. if the lockfile exists then another backup instance is running.. abort and send an informative email to root.. when backup is done the lockfile is removed..  "

The current system , where if a backup starts and the programme performs " Session cleanly closed by timeout after 88500 seconds." does not work .
Especially where the amount of data that needs to be backed-up starts exceeding 2TB.



There must be a lot of users that have backup sets exceeding  >2TB. ? 

Can the current e-smith backup with dar be run manually so that it runs until the full backup is  / has completed?

Re- open as applicable for SME9.2 systems ?
https://bugs.contribs.org/show_bug.cgi?id=10378
« Last Edit: July 22, 2017, 09:34:41 PM by warren »

Offline janet

  • ****
  • 4,812
  • +0/-0
Re: Partial backup stored on backup workstation
« Reply #7 on: July 23, 2017, 05:56:07 PM »
warren

Maybe these are pertinent:

Compression level (0-9) of backup is 9
Full backup session (new backup sets) is allowed only on Friday
Not a permitted day for full backup. Aborting...
Next full backup in 5 days.

Reduce your compression to 3, this lowers the load on the CPU & time taken to compress your files (& thus total time to complete a backup). Do this at least while testing.
Change allowed backup days to Every day, that way the Full backup can continue to run on Saturday & Sunday nights until it completes & then the Incremental can start on Monday night.

Let us know if this change to Full backup allowed on day xxx improves the situation.

Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline warren

  • *
  • 293
  • +0/-0
Re: Partial backup stored on backup workstation
« Reply #8 on: July 24, 2017, 01:46:39 PM »
warren
...
Reduce your compression to 3, this lowers the load on the CPU & time taken to compress your files (& thus total time to complete a backup). ..

Thanks Janet , learnt something new re the compression.
For the last run i did have backups set to "Full backup sessions (new backup set) are allowed everyday".
I'll test this out (lowering compression ) when the current set needs rotating out and report back

What i've done now to get the full backup completed is :
1. Prune the /home/e-smith/files/ibays/IbayOfReallyBigUser1 , IbayOfReallyBigUser2 , IbayOfReallyBigUser3 ;( using proper custom template  --prune home/e-smith/files/ibays/NAMEtoPrune)

2. Rsync these ibays to another server on the lan.

3. Allow the full backup to run( starting friday night and ending Sat night).

4. Then removing the prune list to add in on the incremental run starting Sunday night to add the IbaysOfReallyBigUsers into the backup set.

I still think the primary issue here is that we need a method of starting a backup that keeps running until complete, even if its takes 48 hrs.