On Tue, 13 Jul 2010, Robert Nesius wrote:

> On Tue, Jul 13, 2010 at 5:19 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>>
>> Thus, this...
>>
>> External HDD #1  -->  remote machine  -->  External HDD #2
>>
>> ...was about twice as fast as this...
>>
>> External HDD #1  -->  External HDD #2
>>
>> There's something very wrong with a system that works that way.  If I had
>> enough space on my internal HDD, I'd do this and probably get even better
>> results:
>>
>> External HDD #1  -->  Internal HDD  -->  External HDD #2
>>
>>
>> Another crazy thing is that it must have been really killing my CPU 
>> because I could hardly do anything else while the drive-to-drive USB 
>> transfer was active, but programs like "ps aux" and "top" (both of 
>> which literally took minutes to launch) seemed to show that almost 
>> nothing was happening.  Why is that?
>
> I think this is likely a case of bus-contention.  Especially if the 
> reads and writes were being sent through the same bus/controller.  I've 
> had similar issues when doing things with USB devices.

Maybe I would have better luck if I used a different pair of USB ports. 
I kinda doubt it because it seems like the big problem is with writes. 
Combining reading from one with writing to the other is definitely worse, 
but the major impact on system performance is coming from the writes. 
Maybe slowness of file transfers is an interaction of the two.


> Also were you doing your copies at the file system level (cp or 
> drag-and-drop) or at the block/device (dd) level?

It was "cp -irp dirs" and there were many subdirs and lots of little 
files.


> If at the file system you're incurring overhead in allocating space 
> within the file system and updating the filesystem structures. 
> Depending on how many files you have - that can add up to a significant 
> amount of overhead.

That sounds like part of the problem.  Is there a better way to copy a 
collection of files and directories from one external USB HDD to another? 
I don't know how to do that with dd -- isn't that just for cloning?

By the way, FYI, the data are raw intensity data for genotyping of 660,000 
markers (SNPs and CNVs) for almost 5000 samples.

Mike