On Tue, 16 Nov 2010, Mike Miller wrote: > The interesting finding was that -f, -F and -Ff all did the same thing > in my application but -Ff was more than 30 times faster than -f. A guy named Robert Citek on Missouri Linux Users Group came up with some more helpful info for me (see below). It turns out that grep -F was doing *nothing* with one of the ways I was running it because it works nothing like grep -f (huge mistake on my part). I also found out that I knew nothing about file caching in RAM, which is extremely important here. I still don't get why grep -Ff would be 30 times faster than grep -f. Mike > I suspect that your system has enough RAM to cache the datafile's > contents. For example, here are the results of reading a 200 MB file > twice on my system right after flushing the cache: > > $ sync ; echo 3 | sudo tee /proc/sys/vm/drop_caches >& /dev/null > > $ time -p cat zeros.2M >& /dev/null > real 4.16 > user 0.02 > sys 0.65 > > $ time -p cat zeros.2M >& /dev/null > real 0.22 > user 0.00 > sys 0.20 > > The second time is about 20x faster because the contents are in cache. Very interesting. Yes, there is enough RAM on the system. I had no idea that the file was held in RAM like that. That explains a lot. If different users are accessing the file repeatedly, will one copy stay in RAM and be accessible to all of the users, or would multiple users have to cache it multiple times? >> time -p echo -e "A\nB" | grep -Ff - data_file | grep -E "(A|B).*(A|B)" >> >> real 6.45 >> user 6.22 >> sys 0.59 >> >> ...but either of these takes forever: >> >> time -p echo -e "A\nB" | grep -f - data_file | grep -E "(A|B).*(A|B)" >> >> real 213.69 >> user 213.40 >> sys 0.63 >> >> time -p echo -e "A\nB" | grep -F - data_file | grep -E "(A|B).*(A|B)" >> >> real 165.59 >> user 177.51 >> sys 2.48 >> >> >> All three give exactly the same output. > > I suspect that the intermediate output will be different. How do > these three compare? > > time -p echo -e "A\nB" | grep -F -f - data_file | wc -l > time -p echo -e "A\nB" | grep -f - data_file | wc -l > time -p echo -e "A\nB" | grep -F - data_file | wc -l OK. Now I get it -- the grep -F command wasn't doing *anything*: time -p echo -e "A\nB" | grep -Ff - data_file | wc -l 38015 real 7.54 user 5.88 sys 1.41 time -p echo -e "A\nB" | grep -f - data_file | wc -l 38015 real 254.83 user 254.23 sys 0.58 time -p echo -e "A\nB" | grep -F - data_file | wc -l 28921815 real 9.04 user 9.02 sys 2.48 That makes sense, but I still wonder why use of -Ff is more than 30 times faster than -f. Thanks a lot, Robert. I've been learning a lot today. Mike