On Wed, 28 Oct 2015, Wakefield, Thad M. wrote: > ____________________________________ > From: tclug-list-bounces at mn-linux.org [tclug-list-bounces at mn-linux.org] on behalf of Mike Miller [mbmiller+l at gmail.com] > Sent: Wednesday, October 28, 2015 3:07 AM > To: TCLUG Mailing List > Subject: Re: [tclug-list] Escaped unicode conversion > > On Tue, 27 Oct 2015, Wakefield, Thad M. wrote: > >>> This seems like it should be easy. So I'm suspecting my internet search skills are deficient. >>> >>> I have a text file with escaped Unicode that I want to convert to plain text. >>> >>> From: Why We\u2019re in a New Gilded Age >>> To: Why We're in a New Gilded Age >> >> Tell us if this works for you: >> >> perl -pe 's/\\u([0-9A-Fa-f]{4})/chr(hex $1)/ge' >> >> It assumes there are always four hexadecimal digits following the "\u". >> It will give warnings to stderr about "Wide character in print". >> >> Your example shows conversion to an ordinary apostrophe, like this:> >> >> We're >> >> But my code will give you the UTF-8 character U+2019, like this: >> >> We?re >> >> And that is probably what you want. >> >> Mike > > This converted the text file with escaped Unicode to an UTF8 file which > I was able to convert to an ASCII text file with Notepad++. I was unable > to get iconv to do the conversion. Cool. But how did iconv deal with characters like U+2019? When I try it, it fails on that character: $ echo "Why We\u2019re in a New Gilded Age" | perl -pe 's/\\u([0-9A-Fa-f]{4})/chr(hex $1)/ge' | iconv --from-code=UTF-8 --to-code=ISO-8859-1 Wide character in print, <> line 1. Why Weiconv: illegal input sequence at position 6 Maybe you used a different output encoding. If you use the -c option, it deletes the U+2019 character. Thanks. Mike