>>>>> "DS" == Dave Sherohman <esper at sherohman.org> writes: DS> Robert P. Goldman said: >> Kevin, I think there is an *in principle* reasons why this >> should not be possible. >> >> Parsing HTML is a context-free parsing problem (since the tags >> can embed and you have to have a stack to track the things you >> want to match), not a regular expression parsing problem >> (there's no fixed bound of memory you need to do this job). DS> I disagree. Unless there's more going on here than the DS> original question stated, Kevin doesn't sound like he's DS> interested in the structure of the HTML tags or whether they DS> match up. He just wants to create a list of 'approved' tags DS> and make everything else go away. I agree about the above, I think. But see below, where I think you give me my point... DS> At worst, he might need to walk through the (surviving) tags DS> with a set of flags for whether, e.g., <I> is turned on and DS> append a </I> to the document if the submitter forgot to close DS> it. But notice that this is enough to make my point! Detecting balanced delimiters is the paradigm case of context-free versus regular expression parsing: to match parentheses, you need to have a stack to push the openers onto and pop off of when you find the match. That's a pushdown automaton, not a finite state machine. Best, R1 --------------------------------------------------------------------- To unsubscribe, e-mail: tclug-list-unsubscribe at mn-linux.org For additional commands, e-mail: tclug-list-help at mn-linux.org