On Sat, Mar 5, 2011 at 11:46 PM, Mike Miller <mbmiller+l at gmail.com> wrote:

> On Sat, 5 Mar 2011, Adam Morris wrote:
>
>  Try \x{8a0} instead.  I think that \x normally accepts only two following
>> characters, so you have to use \x{} for long hexadecimal numbers.
>>
>
> You top posted, so I have to ignore you.
>
> Just kidding.  I did try that and that didn't work either.  Then I did
> this...
>
> perl -pe 's/[[:ascii:]]//g ; s/(.)/$1\n/g' file.txt | sort | uniq -c >|
> bad_chars.txt
>
> ...and when I looked at the resulting bad_chars.txt file in emacs again,
> the characters looked different.  Before they were appearing as purple
> rectangles, but now they appeared as a pair of characters that looked like
> this: \302\240
>
> I could represent them exactly that way in perl and delete them.  I don't
> really get what was happening there.
>

I'm guessing you were looking at (possibly variable-length) unicode
characters, and your perl filter split them into fixed-length octets or
something.

-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20110306/ff69845e/attachment.html>