Wed Jun 14 01:06:23 CEST 2006

UTF-8 and the world

I think I've been tiptoeing around the subject for long enough: I'm just going to have to look at UTF-8 and Unicode. We've just bought several boxes of painkillers, so now's the time. (You've got to be prepared for these things in France, as you cannot buy painkillers anywhere else than at a pharmacy. The other day when we mentionned to the pharmacist that painkillers where available in any supermarket in Britain, she looked horrified and mentionned the dangers of combining them or having them with alcohol. Darn, is there a wave of aspirin-related deaths in Britain I haven't heard of?)

So the first step is to get support in the terminal. Apparently my xterm already supports it if I just start it with -en UTF8, or select it in the xterm menu (ctrl-rightclick on it), or if I have the proper locale. I'm a little scared of changing locales, when I do everything usually goes wrong.

So really all we need is fonts.

apt-get install xfonts-efont-unicode xfonts-efont-unicode-ib
The -ib is for italics and bold. With this I now get accents and Japanese characters, which looks really cool. Nevermind, I don't quite speak Japanese just yet.

Then let's do the locale: we'll generate a new locale that supports UTF8 using

dpkg-reconfigure locales
and add the chosen locale in .bashrc, in my case:
    export LC_ALL=fr_FR.UTF8@euro
Right, that doesn't work. I'll just use LC_ALL=fr_FR@euro and start xterm with -en utf-8 for now. There seem to be a bug in the Debian xterm (#318923) that will prevent me from changing the default in /etc/X11/app-defaults/Xterm-color.

We'll then need to adapt some apps to the changed environment. Mutt needs charset="utf-8" in its .muttrc. Vim needs :set encoding=utf8.