On Thu, Jul 24, 2003 at 04:10:00PM -0500, Adam Maloney wrote:
> I believe Carl is "the-man" on this subject, but I'll put in $.02

since I heard someone taking my name in vain, I suppose I ought to throw in
my opinion. 

> The cost of your back-up solution should be reflective of the monetary
> value of the data.

first, most important rule, right there.
 
there are times that it's worth building a whole replicated datacenter
connected via private fiber and fiber-channel repeaters. some of the
companies who had offices in the World Trade Center are probably glad they
had something like that.

a whole lot of them sure wish they did.

needless to say, if all you're backing up is your blog on your co-lo'ed
webserver; something less drastic is in order. :)

I don't know what all kinds of data you're talking about, but keep in mind
that a lot of services are easily replicable; DNS and SMTP have failover
built into the protocol, and it's advantageous to have a DNS and a mail
server somewhere offsite. 

I belive AFS has replication/failover built into it, but I could be wrong.
(Amy?) In any case, AFS is more trouble than most people want to deal
with. :)

> 70Gb burned to CD?  Ick.

I once looked at the economics of an automated backup solution using a CD or
DVD autoloader. aside from the cost of the burner itself (not too many $K),
the cost of media ends up making it more expensive than tape in not too long
a time. Tape is fast and reusable; CD-Rs are not. CD-RWs are even slower;
but one of the problems becomes the *huge* stacks of CDs that you'll need to
back-up your data. storing those things costs you money too. DVDs hold more
data; but they are marginally more expensive per byte. 

70GB/4.7GB(per DVD) = 15 discs.
looks like DVDs are down to less than $1/disk
http://store.yahoo.com/blankcdcdr/dvdr-media-dvd-r.html); so I guess the
economics have changed a bit since I last looked; but even so, spending $15
(plus the amortized cost of a $3000 DVD autoloader) per backup is not
something you'd want to do every night.

I don't know how long it would take to burn those 15 DVDs either; but I'm
sure good tape drives would be notably faster.

it's not a bad idea for occasional, long-term permanent storage tho. (look
at www.mondorescue.com).

> Also, transferring 70Gb to your off-site location might take awhile.  
> Over a T-1 it will take more than 100 hours (70,000MByte = 560,000 MBit /
> 1.5 MBit = 373,333 sec = 103h).

this is why some sort of differential backup is a worthwhile thing. I've
built workable systems with rsync scripts; which only requires one full
transfer of the data to the backup server (much like Nate described in his
post), and ever after (at least in theory) only needs to transfer the files
that change that night.

there's a couple of good pre-built systems that do this better than what
I've cobbled together.

I took a good look at this one:
http://www.stearns.org/rsync-backup/
and found it's pretty good. it's client-side-initiated; so it would be very
good for backing up laptops and other occasionally-connected devices. it
makes a nice live filesystem that you can browse, and you can even browse
previous days' backups as a live filesystem (it uses hardlinks to avoid
replicating identical files).

some people didn't like it; because they belived that allowing the clients
to initiate the backups made the security weaker. it uses a chroot'ed jail
for each client's backup process tho; and in a lot of ways I'd rather that
the backup server was exposed to a limited number of clients, rather than
try to secure remote-initiation access to a large number of clients.

I haven't tried these yet:
http://rdiff-backup.stanford.edu/
http://stitch.bentlogic.net/ 
but they look pretty good. I've heard good things about rdiff-backup.

> DLT4 can do 35Gb raw/70Gb compressed on 1 tape.  Tapes are about $60-$70
> each (last I bought them anyways).  I think you can get DLT4 drives for
> under $1,000 now.

don't buy DLT. buy AIT. 
AIT is *amazingly* fast to search, because it keeps an index of filemarks in
an NVRAM chip on the tape. this is OS-independent; and makes your restores
blazing fast. (which is handy when the CEO deletes his spreadsheet by
accident and wants it back 5 minutes ago, instead of 5 hours from now).

also, AIT uses spinning read/write heads, so the tape doesn't have to move
as fast, which makes 'backhitching' or 'shoeshining' less of a problem, and
is less wear on the tape.

last I knew, cost was comparable to DLT, but that might have changed.

> > 1. copy some  files nightly to a central server (that is out of the 
> > datacenter, but in the same building :) ) and burn them to cd every now 
> > and then. Its about 70 gigs of data right now.

this is something like what I've done for one client in the past. it's a
good and workable scheme. just keep in mind (and I think you have it) that
you need *historical* backups as well as a replication. you can have
differential historical backups on disk (like rsync-backup uses); but if you
want to take it offsite, something more durable than a disk is desireable.
that's what tape is still good for (still the cheapest alternative for
short-term reliable offsite backup).

then again, if you only do offsite backups once a week, and want them for
archival purposes, it may be worthwhile to get a DVD autoloader and just
burn yourself a stack of DVDs.

> > 2. Put tapes on each machine, get lots of tapes.

this is really expensive, considering how much tape drives cost, relative to
the price of a computer now. it's very convenient tho. possibly worthwhile
for centralized servers at remote (netwise) locations.

> > 
> > 3. Get a nicer tapedrive that can backup several machines on one tape

considering the rate at which disk drives are growing (which makes people
sloppy about what they put on disk, which means the drives fill up); this
is becoming less and less viable. 

> > 
> > are there other options that we should look at?

I think rewriteable optical media will be the future of backups; but I don't
know if the big backup tool vendors are adding that capability into their
systems. I think we'll need the next generation of media (50-90GB disks)
before it becomes really viable for smaller operations. certainly Plasmon is
doing it right now; but their solutions are very expensive. (albeit very
fast and reliable, and with write-once media, largely tamperproof, which has
its advantages in some buisnesses).

Carl Soderstrom.
-- 
Systems Administrator (and sometimes backup administrator)
Real-Time Enterprises
www.real-time.com

_______________________________________________
TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
http://www.mn-linux.org tclug-list at mn-linux.org
https://mailman.real-time.com/mailman/listinfo/tclug-list