On Tue, 16 Nov 2010, Mike Miller wrote:

> The interesting finding was that -f, -F and -Ff all did the same thing 
> in my application but -Ff was more than 30 times faster than -f.

A guy named Robert Citek on Missouri Linux Users Group came up with some 
more helpful info for me (see below).  It turns out that grep -F was doing 
*nothing* with one of the ways I was running it because it works nothing 
like grep -f (huge mistake on my part).  I also found out that I knew 
nothing about file caching in RAM, which is extremely important here.  I 
still don't get why grep -Ff would be 30 times faster than grep -f.

Mike


> I suspect that your system has enough RAM to cache the datafile's 
> contents.  For example, here are the results of reading a 200 MB file 
> twice on my system right after flushing the cache:
>
> $ sync ; echo 3 | sudo tee /proc/sys/vm/drop_caches >& /dev/null
>
> $ time -p cat zeros.2M >& /dev/null
> real 4.16
> user 0.02
> sys 0.65
>
> $ time -p cat zeros.2M >& /dev/null
> real 0.22
> user 0.00
> sys 0.20
>
> The second time is about 20x faster because the contents are in cache.

Very interesting.  Yes, there is enough RAM on the system.  I had no idea 
that the file was held in RAM like that.  That explains a lot.  If 
different users are accessing the file repeatedly, will one copy stay in 
RAM and be accessible to all of the users, or would multiple users have to 
cache it multiple times?


>> time -p echo -e "A\nB" | grep -Ff - data_file | grep -E "(A|B).*(A|B)"
>>
>> real 6.45
>> user 6.22
>> sys 0.59
>>
>> ...but either of these takes forever:
>>
>> time -p echo -e "A\nB" | grep -f - data_file | grep -E "(A|B).*(A|B)"
>>
>> real 213.69
>> user 213.40
>> sys 0.63
>>
>> time -p echo -e "A\nB" | grep -F - data_file | grep -E "(A|B).*(A|B)"
>>
>> real 165.59
>> user 177.51
>> sys 2.48
>>
>>
>> All three give exactly the same output.
>
> I suspect that the intermediate output will be different.  How do
> these three compare?
>
> time -p echo -e "A\nB" | grep -F -f - data_file | wc -l
> time -p echo -e "A\nB" | grep -f - data_file | wc -l
> time -p echo -e "A\nB" | grep -F - data_file | wc -l


OK.  Now I get it -- the grep -F command wasn't doing *anything*:


time -p echo -e "A\nB" | grep -Ff - data_file | wc -l
38015
real 7.54
user 5.88
sys 1.41


time -p echo -e "A\nB" | grep -f - data_file | wc -l
38015
real 254.83
user 254.23
sys 0.58


time -p echo -e "A\nB" | grep -F - data_file | wc -l
28921815
real 9.04
user 9.02
sys 2.48


That makes sense, but I still wonder why use of -Ff is more than 30 times 
faster than -f.

Thanks a lot, Robert.  I've been learning a lot today.

Mike