On Thu, 22 Jan 2009, Shawn Fertch wrote:

> A little clarification is in order here...
>
> While rsync does a comparison, it only copies files that have changed. 
> Tar/rcp/scp/find-cpio/etc will typically copy the entire contents 
> depending upon the parameters specified.
>
> If rsync is possible, I would highly recommend using that instead.  It 
> will preserve file permissions, ownership, date/time, etc.  scp/sftp 
> will not unless you tar the directory's contents up and move the tarball 
> over.
>
> While it's true that rsync does chew up time doing the comparison, it's 
> been my experience that rsync (even with the comparison) is most times 
> faster than other methods**.  Given the fact that it keeps permissions 
> so that I don't have to reset anything, even faster. Also, if this is 
> going to be an ongoing transfer of files within the directory, much 
> faster in that it only does the files/directories which have changed.


If every file has to be moved, the comparing would be wasted time, but if 
files are large and most do not have to be moved, the comparison may 
massively save time, especially if the network is slow.


It happens that I started to write the info below a couple of months ago 
to share with this list and did not finish it, but I'm finishing it now. 
My problem was to copy many files from one machine to another, but none of 
the files existed on the target machine.  I really just wanted to make a 
gzipped tar file (.tgz) and send it to another machine.  I didn't have 
much free disk space on the source machine so I had to do a little work to 
figure out the tricks.  Read on:


I want to move files from one GNU/Linux box to another.  The disks are 
nearly full on box with the files currently on it, so I can't write to 
.tgz on the source machine and send the .tgz file.  The data are about 
13GB uncompressed and about 3.7GB in .tgz format.  This is how I get the 
latter number:

tar zpcf - directory | wc -c

That sends the zipped tar to stdout where the bytes are counted by wc.  I 
have about 210,000 files and directories.

There are some good suggestions here on how to proceed:

http://happygiraffe.net/copy-net

I wanted to have the .tgz file on the other side instead of having tar 
unpackage it automatically, so I find out I could do this on the old 
machine to send files to the new machine...

tar zpcf - directory | ssh user at target.machine "cat > backup.tgz"

...and it packs "directory" from the old machine into the backup.tgz file
on the new machine.  Nice.

One small problem:  I didn't have a way to be sure that there were no 
errors in file transmission.  First some things that did not to work:

tar zpcf - directory | md5sum

Testing that on a small directory gave me, to my surprise, different 
results every time.  What was changing?  I didn't get it.  I could tell 
that it was probably caused by gzip because...

$ echo "x" | gzip - > test1.gz

$ echo "x" | gzip - > test2.gz

$ md5sum test?.gz
358cc3d6fe5d929cacd00ae4c2912bf2  test1.gz
601a8e99e56741d5d8bf42250efa7d26  test2.gz

So gzip must have a random seed in it, or it is incorporating the 
timestamp into the file somehow -- something is changing.  Then I realized 
that I just had to use this method of checking md5sums...

On the source machine:
tar pcf - directory | md5sum

Then do this to transfer the data:
tar zpcf - directory | ssh user at target.machine "cat > backup.tgz"

After transferring, do this on the target machine:
gunzip -c backup.tgz | md5sum

The two md5sums are created without making new files on either side and 
they will match if there are no errors.  I moved about 30GB of compressed 
data this way in three large .tgz files and found no errors -- the md5sums 
always matched.

Mike