No subject

Mon Jun 16 10:37:07 CDT 2008

transfer the files
rsync -e ssh -aogvz /sourcedir/ targetmachine:/targetdir      # Second time
will validate

On the target machine so you have an initial archive file:
tar zcvf /targetdir/tarball.tgz /tgzsourcedir

You can md5 it if you'd like.  I haven't seen a need for it unless it's
security or auditing purposes.

I would prefer to transfer many smaller files than one large file.  IMO,
there's too many things that can go wrong with transferring large files.
What if there's a problem and you have to do the transfer again?  Going to
take twice as long providing it goes through the second time. If not, you
get the picture.  If network pukes on an rsync of many files and dirs,
restart it and it'll pick up where it left off and overwrite the necessary
files on the target.

There's many ways to do things.  I'm not saying your way is wrong.  Just not
the way I would have done it.  Of course, if there's a better, more
efficient way to do it than what I currently am, I'm always open to changing
my processes.

-- 
-Shawn

--0016364c68b51ceb690461197ac3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">On Thu, Jan 22, 2009 at 1:43 PM, Mike Miller <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:mbmiller at taxa.epi.umn.edu">mbmiller at ta=
xa.epi.umn.edu</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e=
x; padding-left: 1ex;">
<div class=3D"Ih2E3d">
<br>
</div>If every file has to be moved, the comparing would be wasted time, bu=
t if<br>
files are large and most do not have to be moved, the comparison may<br>
massively save time, especially if the network is slow.<br>
<br>
<br>
It happens that I started to write the info below a couple of months ago<br=
>
to share with this list and did not finish it, but I&#39;m finishing it now=
.<br>
My problem was to copy many files from one machine to another, but none of<=
br>
the files existed on the target machine. &nbsp;I really just wanted to make=
 a<br>
gzipped tar file (.tgz) and send it to another machine. &nbsp;I didn&#39;t =
have<br>
much free disk space on the source machine so I had to do a little work to<=
br>
figure out the tricks. &nbsp;Read on:<br>
<br>
<br>
I want to move files from one GNU/Linux box to another. &nbsp;The disks are=
<br>
nearly full on box with the files currently on it, so I can&#39;t write to<=
br>
.tgz on the source machine and send the .tgz file. &nbsp;The data are about=
<br>
13GB uncompressed and about 3.7GB in .tgz format. &nbsp;This is how I get t=
he<br>
latter number:<br>
<br>
tar zpcf - directory | wc -c<br>
<br>
That sends the zipped tar to stdout where the bytes are counted by wc. &nbs=
p;I<br>
have about 210,000 files and directories.<br>
<br>
There are some good suggestions here on how to proceed:<br>
<br>
<a href=3D"http://happygiraffe.net/copy-net" target=3D"_blank">http://happy=
giraffe.net/copy-net</a><br>
<br>
I wanted to have the .tgz file on the other side instead of having tar<br>
unpackage it automatically, so I find out I could do this on the old<br>
machine to send files to the new machine...<br>
<br>
tar zpcf - directory | ssh user at target.machine &quot;cat &gt; backup.tgz&qu=
ot;<br>
<br>
...and it packs &quot;directory&quot; from the old machine into the backup.=
tgz file<br>
on the new machine. &nbsp;Nice.<br>
<br>
One small problem: &nbsp;I didn&#39;t have a way to be sure that there were=
 no<br>
errors in file transmission. &nbsp;First some things that did not to work:<=
br>
<br>
tar zpcf - directory | md5sum<br>
<br>
Testing that on a small directory gave me, to my surprise, different<br>
results every time. &nbsp;What was changing? &nbsp;I didn&#39;t get it. &nb=
sp;I could tell<br>
that it was probably caused by gzip because...<br>
<br>
$ echo &quot;x&quot; | gzip - &gt; test1.gz<br>
<br>
$ echo &quot;x&quot; | gzip - &gt; test2.gz<br>
<br>
$ md5sum test?.gz<br>
358cc3d6fe5d929cacd00ae4c2912bf2 &nbsp;test1.gz<br>
601a8e99e56741d5d8bf42250efa7d26 &nbsp;test2.gz<br>
<br>
So gzip must have a random seed in it, or it is incorporating the<br>
timestamp into the file somehow -- something is changing. &nbsp;Then I real=
ized<br>
that I just had to use this method of checking md5sums...<br>
<br>
On the source machine:<br>
tar pcf - directory | md5sum<br>
<br>
Then do this to transfer the data:<br>
tar zpcf - directory | ssh user at target.machine &quot;cat &gt; backup.tgz&qu=
ot;<br>
<br>
After transferring, do this on the target machine:<br>
gunzip -c backup.tgz | md5sum<br>
<br>
The two md5sums are created without making new files on either side and<br>
they will match if there are no errors. &nbsp;I moved about 30GB of compres=
sed<br>
data this way in three large .tgz files and found no errors -- the md5sums<=
br>
always matched.</blockquote><div><br><br>To me, the file comparison isn&#39=
;t that big of a deal, and I&#39;d only be concerned about the time it took=
 if it was a cronjob scheduled to run&nbsp; in a tight amount of time (say =
every 10 minutes for a 3GB FS).&nbsp; If it&#39;s to populate a new system,=
 it wouldn&#39;t bother me.&nbsp; I would say if it&#39;s that much of a co=
ncern on the initial load, then you haven&#39;t given yourself enough time =
to do the work.&nbsp; Remeber the 6 P&#39;s...</div>
</div><br>While I admire the thought you put into your process above.&nbsp;=
 IMO, it&#39;s not efficient enough for my tastes.&nbsp; Also, too many cha=
nces for errors.&nbsp; Here&#39;s how I would have done it:<br><br>From the=
 source machine:<br>
rsync -e ssh -aogvz /sourcedir/ targetmachine:/targetdir&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp; # Run once to transfer the files<br>rsync -e ssh -aogvz /source=
dir/ targetmachine:/targetdir&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # Second time w=
ill validate<br><br>On the target machine so you have an initial archive fi=
le:<br>
tar zcvf /targetdir/tarball.tgz /tgzsourcedir<br><br>You can md5 it if you&=
#39;d like.&nbsp; I haven&#39;t seen a need for it unless it&#39;s security=
 or auditing purposes.<br><br>I would prefer to transfer many smaller files=
 than one large file.&nbsp; IMO, there&#39;s too many things that can go wr=
ong with transferring large files.&nbsp; What if there&#39;s a problem and =
you have to do the transfer again?&nbsp; Going to take twice as long provid=
ing it goes through the second time. If not, you get the picture.&nbsp; If =
network pukes on an rsync of many files and dirs, restart it and it&#39;ll =
pick up where it left off and overwrite the necessary files on the target.<=
br>
<br>There&#39;s many ways to do things.&nbsp; I&#39;m not saying your way i=
s wrong.&nbsp; Just not the way I would have done it.&nbsp; Of course, if t=
here&#39;s a better, more efficient way to do it than what I currently am, =
I&#39;m always open to changing my processes.<br>
<br clear=3D"all"><br>-- <br>-Shawn<br>

--0016364c68b51ceb690461197ac3--