why it violates the unix philosophy - in my mind - is that apparent size
has nothing to do with he primary function of du - which is to display disk
usage. And the unix philosophy is to do one thing and do it well.

the apparent size flag for du is trying to get du to do things that other
utilities already do.






On Sat, Apr 5, 2014 at 12:50 PM, Mike Miller <mbmiller+l at gmail.com> wrote:

> Thanks, David.  I thought that was the issue -- that apparent size would
> not include overhead, so I was not able to understand why I was getting
> apparent size that was smaller than ondisk size.  After they moved my data
> to a different array, that difference reversed direction.  This was
> explained to me last night:
>
> "on the old project spaces, zfs did some compression on the data so the
> apparent-size was larger than the ondisk size."
>
> So, compression is also an issue, and I wouldn't have thought of that.
>
> Now that there is no compression, I see that ondisk usage is 20GB more
> than apparent size:
>
> $ \du -sB GB --apparent-size miller
> 146GB   miller
>
> $ \du -sB GB miller
> 166GB   miller
>
> $ find miller | wc -l
> 9908
>
> So there are about 2 million bytes of overhead per file, which seems like
> a lot, to me.  I would think that implies disk blocks of multiple
> megabytes, which seems unlikely.  There must be more that I don't
> understand.
>
> Regarding your idea (David)...
>
>  As an aside, imho, the 'apparent size' option is really a terrible option
>> to include in 'du' and is a violation of the unix philosophy because it has
>> explicitly NOTHING to do with disk management. But that's neither here nor
>> there.
>>
>> A better way to get the byte count of a file is
>>
>> stat --format=%s
>>
>
> ...I guess you mean that we should do something like this to get the
> totals for a directory and contents:
>
> $ find miller -print0 | xargs -0 stat --format=%s | awk
> '{sum+=$1}END{print sum}'
> 145159848954
>
> OK, that does work, but how horrible is it that I can get exactly the same
> answer like so:
>
> $ du -sb miller
> 145159848954    miller
>
> Of course it's worse if you want to do multiple directories at once.
>
> That's a violation of unix philosophy?  It isn't true that it has nothing
> to do with disk management.  For example, when moving files between
> systems, it might help a lot to know the actual size.  What if I want to
> make a .tar file from a directory?  How large will that file be?  How much
> space will the files take up on tape?  If I'm using tar for tape backup, I
> think the size will be given by --apparent-size, not by ondisk size.
>
> Mike
>
>
>
> On Fri, 4 Apr 2014, David Wagle wrote:
>
>  "apparent size" is the "ls -l" size of the file.
>>
>> which is the "rght" size for you to use is dependent on what you're trying
>> to do.
>>
>> Apparent size is nearly useless for managing disks -- which is usually
>> what
>> you use du for.
>>
>> Say my disk has blocks that are 1KB. If I have a file with the nothing but
>> the letter 'A' in it, that will have an apparent size of 1 byte. But
>> because the smallest block size on my disk is 1KB, that 1 byte file will
>> USE 1 KB of disk space no matter what because the physical data has to be
>> recorded in a block and that block will then be marked 'used.'
>>
>> As an aside, imho, the 'apparent size' option is really a terrible option
>> to include in 'du' and is a violation of the unix philosophy because it
>> has
>> explicitly NOTHING to do with disk management. But that's neither here nor
>> there.
>>
>
>
>
>
> On Fri, 4 Apr 2014, David Wagle wrote:
>
>  "apparent size" is the "ls -l" size of the file.
>>
>> which is the "rght" size for you to use is dependent on what you're trying
>> to do.
>>
>> Apparent size is nearly useless for managing disks -- which is usually
>> what
>> you use du for.
>>
>> Say my disk has blocks that are 1KB. If I have a file with the nothing but
>> the letter 'A' in it, that will have an apparent size of 1 byte. But
>> because the smallest block size on my disk is 1KB, that 1 byte file will
>> USE 1 KB of disk space no matter what because the physical data has to be
>> recorded in a block and that block will then be marked 'used.'
>>
>> As an aside, imho, the 'apparent size' option is really a terrible option
>> to include in 'du' and is a violation of the unix philosophy because it
>> has
>> explicitly NOTHING to do with disk management. But that's neither here nor
>> there.
>>
>>
>> On Fri, Apr 4, 2014 at 2:29 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>>
>>  On Tue, 1 Apr 2014, Mike Miller wrote:
>>>
>>>  On Tue, 1 Apr 2014, Ben wrote:
>>>
>>>>
>>>>  -h will always be different from the actual disk usage, you might also
>>>>
>>>>> want to play around with -B option too.
>>>>>
>>>>>
>>>> I've done that.  Using --si -sB GB gives the same result as --si -sh.
>>>> Did
>>>> you think that they would be different?
>>>>
>>>>
>>> Thanks for the suggestions.  Now I have answers (below).
>>>
>>> I was misusing the --si option there.  It should be used *instead* of -h,
>>> not in conjunction with it.  These two commands should do the same thing
>>> when the volume in "dir" is in the multi-gigabyte range...
>>>
>>> du -s --si dir
>>> du -sB GB dir
>>>
>>> ...and so should these two commands:
>>>
>>> du -sh dir
>>> du -sB G dir
>>>
>>> The first pair will report 1000*1000*1000 bytes and the second will
>>> report
>>> 1024*1024*1024 bytes.
>>>
>>>
>>>
>>>  What happens when you use --apparent-size option.
>>>
>>>> --apparent-size
>>>>>   print apparent sizes,  rather  than  disk  usage;  although the
>>>>>   apparent  size is usually smaller, it may be larger due to holes
>>>>>   in ('sparse') files, internal  fragmentation,  indirect blocks,
>>>>>   and the like
>>>>>
>>>>>
>>>> I want to try that, but I'm having this problem right now:
>>>>
>>>> $ ls /project/guanwh
>>>> ls: cannot access /project/guanwh: Stale file handle
>>>>
>>>>
>>> Yep, you nailed it.  That was the issue.  If I use --apparent-size, the
>>> results are consistent.  According to supercomputing staff:
>>>
>>> "it is not a bug, -b is implies --apparent-size, so to compare its output
>>> to -sm/sh you have to include --apparent-size with -sm/-sh as well.
>>>
>>> "when the apparent size is different from the reported size it is not a
>>> bug in du but rather a feature of the filesystem :)"
>>>
>>> Now I just have to figure out which is the right size for me -- apparent
>>> or reported.  I guess apparent sizes are the real file sizes.  In this
>>> example "dir" has about 10,000 files in it with about half being 5 KB and
>>> have about 29 MB:
>>>
>>> $ du -s --si dir
>>> 162G    dir
>>>
>>> $ du -s --si --apparent-size dir
>>> 143G    dir
>>>
>>> $ du -sb dir
>>> 142038799951    dir
>>>
>>> $ wc -c dir/* | tail -1
>>> 142037349967 total
>>>
>>>
>>> One thing to note:  It seems that du always rounds up.  So if 1.1 GB are
>>> used, du will report 2 GB.
>>>
>>>
>>> Mike
>>> _______________________________________________
>>> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
>>> tclug-list at mn-linux.org
>>> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
>>>
>>>
>>  _______________________________________________
> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
> tclug-list at mn-linux.org
> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20140405/aa804578/attachment-0001.html>