On Tue, 18 Jan 2005 09:01:51 -0600
"John J. Trammell" <trammell+tclug at el-swifto.com> wrote:

> On Mon, Jan 17, 2005 at 01:50:53PM -0600, Josh Trutwin wrote:
> > Sorry for the cross post - wasn't sure what the best list to ask
> > this was.
> > 
> > I need a Perl RE to replace a specific HTML comment, something
> > like:
> > 
> > <!-- blahblahblah Josh blahblahblah -->
> > 
> > I want to remove all comments with the word Josh in it.  "Josh"
> > could be anywhere inside the comment tags.
> 
> Handling HTML with regexen is wrong, but this might get you by:
> 
>    s/<!--.*?Josh.*?-->//sg;

Actually it's XML - and I'd certainly use an XML parser module to try
to do some of this, but much of the XML is unfortunately invalid by
the Perl parsers I've tried (duplicate attributes, etc.) to load these
docs into.  Unfortunately I don't control the XML content either...  

Regardless, the above regex is a little too greedy, despite the .*? -
for example it matches the entire string:

<!-- Comment 1 -->Content<!-- Comment which includes Josh -->

I'm having a little luck with lookahead assertions, but still don't
quite have it right.

Josh