Koozali.org: home of the SME Server

Backup design/implementation

Offline DanB35

  • ****
  • 764
  • +0/-0
    • http://www.familybrown.org
Backup design/implementation
« on: July 08, 2014, 11:24:41 PM »
I spent some time debating with myself whether to post this, because I'm concerned that it will come across as simply complaining.  I haven't been able to think these issues through to the point of offering concrete suggestions, but I'm hopeful that raising them will get some discussion going toward implementing some improvements.  The necessity of backing up and restoring in order to upgrade to SME 9.0 makes these issues somewhat more timely, I believe.

I've been using SME Server since the days of e-smith 3.something.  Thankfully, in that time, I've never lost any data due to a hardware failure, even without running backups (or RAID for most of that time), but I know I'm on borrowed time in that regard.  Recently, I've had enough storage space available to start doing backups, and after a bit of a rocky start (addressed in other threads), I have daily backups running.  It appears, though, that the backup system in SME Server is an afterthought, at best.  There are several areas where it just doesn't seem to be particularly well thought through.  Here are some that I've noticed:
  • There's an option in the web-based server manager named "workstation backup."  No doubt there's a historical reason that this name made sense, but it's incomplete at best, and misleading at worst, today.  This is the option to choose for any backup to a network server, whether SMB/CIFS or NFS (though NFS is broken in the base SME Server installation).  It's also the option (well, an option) to choose if you want to back up to a local (i.e., to the SME server) hard drive.
  • There's an option to back up to tape.  This apparently backs up a different set of data than the "workstation backup", as the estimated size of that backup on my server is about 20% greater than the estimated size of the workstation backup, disregarding compression.
  • There's an option in the server manager to create a download file to download to a client computer via the web browser.  Since the file size is limited to 2 GB, this isn't going to work for any but the smallest installations.  There also doesn't appear to be any way to restore such a backup.
  • There's an option in the text console to do a backup to a local drive.  It's completely incompatible with the workstation backup to local drive.
  • The requirements for both flavors of backup to local disk, as far as partitioning, filesystem, mount status, etc., are undocumented, and the two flavors appear to have different requirements.  The administrator's manual section on the workstation backup links to a wiki page on USB disks, which describes in great detail the process of partitioning, formatting, and mounting them, but doesn't clearly state what's required for backup use (e.g., "workstation backup to local disk requires that the disk not be mounted before the backup begins, and that it contain only one partition formatted with either the ext2, ext3, or FAT32 filesystem.")
  • The only way to test your workstation backup settings is to wait for a scheduled backup to start, or manually kick one off from the shell.
  • The server manager has no way of monitoring the progress or history of the system backups.
  • During initial system configuration after a fresh installation, one of the first questions asked is whether the user wants to restore from a backup.  Although there are several ways of doing a backup, this can restore from only one, and it doesn't indicate this anywhere on the screen.
  • The workstation verify and restore suffer from serious TMI.  Both list every single file verified/restored.  Presumably they indicate errors in that list as well, but they're hidden among tens, or even hundreds, of thousands of lines of output.  The user has to scroll through all that output to get to the "complete!" line.

There are probably some other issues, but I'm sure I've complained more than enough already.  I understand that there's a historical reason for some of this--like that flexbackup was part of the original system, while dar was a contrib that was later rolled into the distro.  But right now, it results in a mish-mash of confusingly incompatible options with a further-confusing interface, where some of the functions just don't work at all (like NFS).  Here are some suggestions that I think would improve the situation:
  • Pick one method of backing up to an attached hard drive, and use that method in both the web manager and the text console.  Bonus if the corresponding restore feature can automatically identify which of the two current methods was used for a backup, and restore either (as a matter of backward compatibility).
  • Allow restoring from any backup type, including network backups, during the initial system configuration.
  • Replace the "workstation backup" language with something more descriptive--perhaps "back up to network share".
  • On the server-manager page to set up backup to an attached drive, describe the necessary characteristics of that drive.  Bonus if the drive can be prepared within the server-manager, and/or if it can be tested to confirm that it will work.
  • Show backup progress/status and history in the server-manager.
  • I think the whole backup UI need to be seriously reworked.  Unfortunately, this is a pretty vague suggestion, and I'm having trouble being more specific on this.

I hope this is helpful, and can lead to some improvements in this area.
......

Offline ReetP

  • *
  • 3,731
  • +5/-0
Re: Backup design/implementation
« Reply #1 on: July 09, 2014, 01:20:30 AM »
Your comments are not necessarily complaints but worthy criticism :-)

Helping to identify problem areas, and where we should be doing better, is very important. Constructive criticism is always a good thing.

Without wishing to teach you how to suck eggs the old adage applies - if it doesn't work as expected, or as you want it, first thing is to have a look on the bug tracker for stuff on backups and look for bugs both old and new (do an advanced search and include 'closed' bugs etc)

Some of these points have been raised before (I raised a few myself) and you may find your answer, or they may need to be reinvestigated

Go through them and if you see one of your points either reopen it, or create a new one. Don't try and bunch all of your points together. Each point is a separate bug even though they all come under backup/restore.

If in any doubt, ask on the dev mailing list at http://lists.contribs.org/mailman/listinfo/ for further guidance and someone will help.

At the end of the day if you join in and give us hand you are much more likely to get the system that you want :-) And we need all the help we can get.

B. Rgds
John
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline DanB35

  • ****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Backup design/implementation
« Reply #2 on: July 09, 2014, 06:31:40 PM »
I have no problem with filing one or more bugs (already done in the case of NFS, which is a clear bug IMO), but I was kind of hoping to spark some discussion to help me and others think through some of these issues first.  I'll need to ponder these some more and see what's already in bugzilla.
......

Online Stefano

  • *
  • 10,839
  • +2/-0
Re: Backup design/implementation
« Reply #3 on: July 09, 2014, 08:09:46 PM »
I agree..

we need to find the way to give admin user:
- the possibility to restore (in a easy and fast way) a single file
- the possibility to create an offline full backup for disaster recovery
- the possibility to open a backup set using standard tools, in windows/whatever O.S.

the first thought that comes in my mind is that dar is not the right tool.. it has almost no client for windows, is not "standard".. tar, instead, is widely used and known

we should/could/might think about backuppc.. this is my opinion and maybe this is not the right place to discuss..

Offline janet

  • ****
  • 4,812
  • +0/-0
Re: Backup design/implementation
« Reply #4 on: July 09, 2014, 08:10:43 PM »
DanB35

Some additional useful info is available under B for Backup ........
See
http://wiki.contribs.org/Category:Howto
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline DanB35

  • ****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Backup design/implementation
« Reply #5 on: July 09, 2014, 08:49:46 PM »
I don't feel that I know a whole lot about the various backend tools available.  Certainly tar is ubiquitous, but dar seems to support chunking in a fairly straightforward manner--nobody wants to deal with a 300 GB tarball, and many filesystems won't even support one, so the ability to split the backup into manageable-size files is important.  Dar also indexes the backup, unlike tar.  There are clients available for most Unix flavors and Windows, so I'm not sure the choice of dar as a backend is so problematic.

Obviously, the critical capability of a backup is the ability to restore the data, completely, reliably, and easily--otherwise we might as well write the backup to /dev/null.  As near as I can tell, we have that ability today.  Selective file/directory restore is, IMO, a step below this in importance, but it's a small step, and the current system allows for this, albeit with a cumbersome UI (enter a regex for what you want to restore?).  I hadn't tried this before, which is why I didn't address it in my post.

Disaster recovery would be very nice to have.  Early versions of e-smith had the option to create a reinstall floppy that would set the basic configuration and then (IIRC) restore a backup.  At least, that was the idea--I don't believe it ever worked well, if at all.  It would seem like the basics of this idea are present already--you can do a basic installation, reboot, and then restore from a backup.  The problem here is that this only works if you did the right kind of backup.  Otherwise, you have to go through a fair bit of server setup and configuration before you restore your backup (that replaces all that configuration).

One good thing to come out of this is that by the time I have SME 9.0 up and running, I will have validated my backup/restore process.
......

Offline DanB35

  • ****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Backup design/implementation
« Reply #6 on: July 09, 2014, 10:19:10 PM »
Bugs 8490 and 8491 submitted.
......

Offline brianr

  • *
  • 988
  • +2/-0
Brian j Read
(retired, for a second time, still got 2 installations though)
The instrument I am playing is my favourite Melodeon.
.........

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Backup design/implementation
« Reply #8 on: July 10, 2014, 07:10:38 PM »
Disaster recovery would be very nice to have.  Early versions of e-smith had the option to create a reinstall floppy that would set the basic configuration and then (IIRC) restore a backup.  At least, that was the idea--I don't believe it ever worked well, if at all.

Really? I don't recall many (any?) bug reports. As you know, we do take restore from backup bug reports seriously.

Quote
It would seem like the basics of this idea are present already--you can do a basic installation, reboot, and then restore from a backup.  The problem here is that this only works if you did the right kind of backup.

It was designed (a long time ago) specifically to work with tape backup and backup to desktop. Those were the only options available at the time.

Offline DanB35

  • ****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Backup design/implementation
« Reply #9 on: July 10, 2014, 07:40:09 PM »
I just happened to be browsing some old devinfo emails the other day and saw some discussion from 2002 involving you, and mentioning continuing problems with reinstall floppies.  It was probably an overstatement to say "I don't believe it ever worked well, if at all"--I do recall having had trouble with it, but it's been long enough ago that I don't remember any details, which are moot in any event at this point.

I lived through enough of the history to understand at least the broad strokes of how we got where we are.  I remember Darrell May working on the "workstation backup" using dar, though I have to admit I don't clearly recall its being integrated into the base install.  But however reasonable or understandable the road that got us here, "here" is a state where we have two mutually-incompatible backup systems in place.  This is confusing to the end user, and I don't see that it's helpful to the developers' workloads either, since both systems have to be maintained.

Unfortunately, I can't code in any of the relevant languages, so I don't believe I can be of any help in the implementation.  I'll do whatever else I can to assist, though.
......

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Backup design/implementation
« Reply #10 on: July 10, 2014, 08:44:58 PM »
... though I have to admit I don't clearly recall its being integrated into the base install.

I don't either. I don't think it was ever adequate quality, and the issues of incompatibility with the other systems should have been addressed before it was "integrated".

How different would your suggestions be if that "contrib" were removed?

Offline janet

  • ****
  • 4,812
  • +0/-0
Re: Backup design/implementation
« Reply #11 on: July 10, 2014, 09:22:56 PM »
DanB35

Quote
I remember Darrell May working on the "workstation backup" using dar, though I have to admit I don't clearly recall its being integrated into the base install.....

Darrell May developed the DAR2 contrib.

I believe this thread  http://forums.contribs.org/index.php?topic=37922.0  covers the early days of the subsequent backup with dar contrib development by jpl, & subsequent integration into sme server as a replacement for e-smith-backup

Quote
But however reasonable or understandable the road that got us here, "here" is a state where we have two mutually-incompatible backup systems in place.  This is confusing to the end user, and I don't see that it's helpful to the developers' workloads either, since both systems have to be maintained.

Is it really that hard to understand ? The Manual clearly states which backup to use when & where etc, despite your comments about documentation issues.

The different backup methods are there due to historical reasons, but so is most of sme server, in the form it is currently in.

If you create a backup using the admin console (tgz), then that is the backup you use when you are asked on first boot after installing the OS if you want to restore.

If you create a backup in server manager backup or restore panel (ie the dar workstation backup), then you need to install the OS, configure it for your network, configure backup in server manager, then restore using the server manager backup or restore panel.

I agree that the terminology could be improved, but backup & restore does work, & you have 2 choices (in a default install).
If you do not like either of those, then you have numerous other contribs available.

While it may take developer effort to maintain the two backup methods, it will probably take a lot more developer effort to create a new backup system, or otherwise integrate backup into one system etc.
Your ideas are good, but motivated developers with spare time are needed to do the work, & typically if something works OK in sme server, then it is usually left as is, unless someone is motivated enough to do something about it.

Now that sme9 is released, perhaps attention can be given to tidying up some aspects.
Ian Wells has already started to address issues with backups & make some improvements, so hopefully this can continue, with suitable assistance.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline DanB35

  • ****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Backup design/implementation
« Reply #12 on: July 10, 2014, 10:23:16 PM »
Charlie, interesting hypothetical.  I'll probably be a little "stream of consciousness" in my reply at the moment.  Removing the "workstation backup" would, of course, remove the issue of incompatible USB disk backups.  It would also, if I'm not mistaken, remove the only "stock" facility to run regular scheduled backups, as well as the only way to back up to a network share, both of which are (IMO) pretty important capabilities.

I don't think the cosmetic cleanup suggestion is really affected, but that's also the least specific of my suggestions (and there's already a long-standing bug, #4710, on the subject).  I also don't think my suggestion of removing the "backup to web browser" is really affected--I just don't believe that's viable any more except for the smallest of installations, and the ability to restore from such a backup has long since been removed.

If that specific module were removed, I think its capabilities would need to be replaced--scheduled backups, in particular, are just too important to not have in the base installation.  Network backups might be less critical, but I'd still rate them as pretty important.  Of course, my network backup destination is a redundant ZFS pool, which I'd consider much more reliable than a single USB drive, so that's going to be a factor in my opinion.

As I see it, there are several independent pieces involved in how SME does backups.  There is the web manager panel, the server console, the scripts that run the actual jobs, the backend tools that package and write the data, and the devices on which the data is stored.  In the typical Unix-y way of things, any of these pieces can be replaced with something else, without great effect on the others.  If we decide that we're going to get rid of the workstation backup module, that leaves the question of what backend tool set we're going to use--tar, dar, or something else.  No doubt there are lots of technical pros and cons to the various tools available; the only one I know of (which may be a matter of implementation rather than inherent to the tool) is file chunking.
......