[tclug-list] grepping tricks

Tue Nov 16 14:34:17 CST 2010

On Tue, 16 Nov 2010, Mike Miller wrote:

>> time fgrep -e A -e B data_file | grep -E '(A|B).*(A|B)'
>> 
>> real    0m6.411s
>> user    0m6.205s
>> sys     0m0.621s
>
>
> To be fair to Justin Krejci:  He recommended grep -F, but it was my idea 
> to use it in the way I did, which did not work well, but fgrep is the 
> same as grep -F, so this is identical to the fgrep line above:
>
> time grep -F -e A -e B data_file | grep -E '(A|B).*(A|B)'

The way grep works with -F and/or -f is pretty strange.  I'm using grep 
(GNU grep) 2.5.1.  Here are some results for -Ff, -f and -F:

This is super-fast...

time -p echo -e "A\nB" | grep -Ff - data_file | grep -E "(A|B).*(A|B)"

real 6.45
user 6.22
sys 0.59

...but either of these takes forever:

time -p echo -e "A\nB" | grep -f - data_file | grep -E "(A|B).*(A|B)"

real 213.69
user 213.40
sys 0.63

time -p echo -e "A\nB" | grep -F - data_file | grep -E "(A|B).*(A|B)"

real 165.59
user 177.51
sys 2.48

All three give exactly the same output.  That speed difference is pretty 
amazing.  I have no idea what grep is doing differently in the slower 
cases that would justify all that extra time.  By the way, A and B are 
both 5-digit strings, nothing long or weird.

Mike