[TCLUG] statistical math problem and search algoritm

Wed Jul 16 15:34:35 CDT 2003

So, not really a linux question persay, however, I'll probably be trying to
implement this with perl under linux.  :)

Say I have a very large file of random characters, like 8GB.  And I have
another smaller file of random data.  I want to take the largest chunks I
can from the small file, and find out where in the large file they will fit
(match).  Statistically, how likely is it to match strings of length 50
chars, 100, 200, etc...  

Also, what kind of search algorithm would be best for this.  Say I'm trying
to match a string 50 characters long to something in the larger file and it
matches, but, if I had started that string 20 characters sooner, I would
have been able to match 70 characters instead of just 50.  I want kind of a
fuzzy search algorithm that can find the largest matching pieces first, and
then match the smaller leftovers.

Any ideas? :)

Jay

_______________________________________________
TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
http://www.mn-linux.org tclug-list at mn-linux.org
https://mailman.real-time.com/mailman/listinfo/tclug-list