IIRC string matching algorithms for VERY LONG strings (e.g., DNA sequences), are relatively specialized beasts. Tend to use vector operations and be parallelized. Here's a pointer into the Citeseer database that might get you started: http://citeseer.nj.nec.com/298209.html As far as the probabilities..... hmmmm. As a first approximation, you might try just looking at a geometric distribution. Wouldn't be accurate, because if you got a mis-match at point n, after matching at positions 1...n, rolling back to position 2 wouldn't be an independent trial. But it would give you an upper bound quickly. R _______________________________________________ TCLUG Mailing List - Minneapolis/St. Paul, Minnesota http://www.mn-linux.org tclug-list at mn-linux.org https://mailman.real-time.com/mailman/listinfo/tclug-list