-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Make sure your locale is set appropriately. A UTF-8 ñ is a different
character than it is under 8859-1.

I've personally found that vim does really well editing files like
this, as it does many of the conversions automatically. I would imagine
that emacs does likewise.

greg wm wrote:
> hi folks,
> 
> feels rather like i've ventured into uncharted territory, but somebody
> out there somewhere must know the way..
> 
> i used wget to copy the entire http://nonviolentpeaceforce.org site to
> http://nvpf.org/np.  the former is asp pages, the latter captured as html.
> 
> for example, http://nonviolentpeaceforce.org/spanish/welcome.asp was
> captured to http://nvpf.org/np/spanish/welcome.asp.html
> 
> as you can see, the capture is mostly fine, including spanish characters
> in the text (eg año), however the spanish characters in the menus didn't
> do quite so well (eg Misi?n)
> 
> in the file año appears as año which is apparently "good", but
> Misi?n appears as Misión, which is apparently "bad".
> 
> first question:  why is that bad?
> 
> if i tell galeon, instead of automatic encoding, use western iso-8859-1,
> or any of many others, presto, the page appears nicely.  but i don't
> have to do that to see the original, nor do i have to do that for
> anybody else's pages, and of course i can't expect our audience to go
> and fiddle with that in their browsers.
> 
> but really now, why isn't an ó an ó?  right after the title the file
> says <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1">.  why isn't that good enough?  do i need to change
> some directive or setting in apache?
> 
> second question:  it looks like wget was inconsistent!  why?
> 
> likely hint:  the menus are rendered out of some .asp database or
> whatever, differently than the rest of the text of the page.
> 
> but, so what?  why didn't wget capture something identical to what my
> browser shows?  the command i ran was
> wget -ENKkrl19 -nH -w2 -owget.log http://nonviolentpeaceforce.org
> 
> so anyway i sez hey no problem, i'll just find and replace.  well ha.
> couldn't get either egrep nor sed to find an ñ that was right under
> their noses.
> 
> third question:  what's the trick to find and replace these buggers? vim
> can find them, in interactive mode, so.. should i be trying to figger
> out how to use vim as a grep replacement.. uhh.. ..?
> 
> fourth question:  where should i be asking these questions, or, where do
> i look for the mysterical solution, and will i recognize it when i see it?
> 
> tia,
> greg
> 
> Greg Whitley Mott
> IT Coordinator
> NonviolentPeaceforce.org
> 
> _______________________________________________
> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
> tclug-list at mn-linux.org
> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
> 

- --
Daniel Taylor
random at argle.org
Forget diamonds, Copyright is forever.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDB0Iy8/QSptFdBtURAuwOAJ9yo1UnPGizkWL58dXwBBe0A9ulkACfVSCl
6JiAyfX1eKiFT6YouXp9Xdc=
=c0fp
-----END PGP SIGNATURE-----