Major Update to pine 4.64:

PINE UTF-8 and charset conversion FAQ

Q: Where can I get the latest patch?

Q: What are the prerequisites for using this patch?

  1. You need to have a recent POSIX-compliant operating system which either has built-in support of the POSIX iconv API or you have to install an iconv implementation. GNU libiconv works on on Linux and Windows at least. However if not running Linux, you will need to make some adjustments to link to your iconv library, but I can help with that and I would like to get mail (at bk on suse.de) from you when you tried to install on a different operating system so that I can add the changes to the patch and mention it here.
  2. If your iconv library is not integrated into your standard C library on your system (may also happen with Linux distributions which use a separate libiconv), you have to call the pine build command with EXTRALDFLAGS=-liconv.
  3. On Linux, you need to have the glibc locale package installed, otherwise your iconv will not have the neccessary data and routines for actually converting between all the charsets. Normally it's installed by default, and needed for all other locale support too, but if something does not work, that's where to look at first. You can test it by testing the iconv command which is usually also installed and which uses libiconv to convert between character encodings.

Q: What are the effects of this patch?

  1. The "Terminal garbled" problem which may happen when receiving a mail which cannot be displayed on your terminal (e.g. some spam) should be gone, as with the patch it is tried to only output characters which your terminal could actually display (if the neccerary fonts are available to it), especially if your terminal is in UTF-8 mode, it is definitely gone.
  2. It allows you to convert mail using character encodings which pine does not support, it's set of supported character sets is limited and it allows to have improved charset conversion as it is suppied thru specialized charset conversion libraries. Right now, the iconv API is used for that, but more conversion engines could be added if needed on certain platforms. Some implementations of iconv (like the iconv implementation on Linux) have support for transliterating charaters if no equivalent character is available in the destination charset.

    This can impove the display and export of mail in your specific environment even if do not use any Unicode in your environment.

  3. It allows you to use a Terminal emulator which is using Unicode (encoded as UTF-8) as the display terminal for pine, thereby giving you the ability to view and work on correctly encoded mail from anywhere in the world - under the condition that the mails which you receive are proplery standards-conformant and your terminal used a character set setup and font in which these characters can be displayed.

    Example: Using enhanced terminals like the mlterm for X11, you can even use Hebrew in right-to-left mode and quickly switch to left-to-right display of other, completely different languages without restarting pine.

    So this patch allows you to switch your display terminal to the UTF-8 charset and it allows pine translate all the character encodings used around the word to your display. Even if you can't read all languages, you are at least able to view the mail correctly. This also applies to e.g. curly apostrophes and the like which are often used in Windows charsets and which have no good corresponding characters e.g. in ISO-8859-1. But with UTF-8, such symbols will be shown correctly.

Q: Viewing mails works for some mails but not for others, why?

There are several reasons why this can happen:

What is the status of UTF-8 support in the stock pine?

Pine as provided from the University of Washington only uses it's internal code tables for character set conversion which limits the number of charsets supported to about a dozen or so. With pine4.61, UTF-8 has been added as a character-set from which pine can recode from, but it can't recode from other charsets to UTF-8 for displaying mail or exporting to file/pinter.

If you want to see some support for UTF-8 in pine, ask the pine maintainers on the pine-info list to improve support for it (something can be even done whithout the iconv function which one of the pine developers, possibly the main developer, objects) to show them the demand. Even little progress would be good.

Q: Does composing messages work also on UTF-8 terminals?

Yes. you can use pine's default editor pico (which is built into pine and perfectly integrated) which is now working with UTF-8 as well or if you want to use a different editor, you can define one using the pine configuration option alternate-editor.

There are several editors for Unix which support UTF-8 as well. For instance, Vim 6.x is an excellent text editor with solid UTF-8 support. Emacs also supports UTF-8 and has interfaces to CJK input methods.

Mike Fabian has a page about UTF-8 Internationalisation in GNU Emacs and XEmacs in his document on CJK(Chinese, Japanese, Korean) Support in SUSE Linux. Emacs, using emacs -nw as alternate editor will work.

Q: Is the charset conversion applied during printing?

Yes, translation is done to the character-set of the display device configured for pine.

Report from a tester:

I have had time to test printing, and it works quite well! character set conversion to UTF-8 happens. This is good in my case -- I am doing "attached print", and my emulator handles the UTF-8 codes in the print stream.

However, I can imagine that it would be a problem for other people who were writing directly to a printer. There are not many (any?) UTF-8 printers around. If the user had a printer that worked in Big5 (Chinese), and they usually got Big5 emails and printed them, then installing your patch would break things. I think the best solution is to create an additional configuration variable for "printer character set"; or maybe a flag associated with each printer definition.

(I do one other change for printing. In cmd_print, where it calls format_message, I add a flag FM_NOWRAP, so it doesn't break my lines based on the number of screen columns.)

If not printing to an printer attached to the terminal, it should be possible to use a character set filter as personal print command in Setup/Printers. For instance, you can use Juliusz Chroboczeck's cedilla has a good WGL4 coverage sufficient for Latin, Greek and Cyrillic text. Another good printer filter is uniprint included in Gapar Sinai's Yudit.

 Personally selected print command
      The text to be printed will be piped into the command given here. The
      command is in the 2nd column, the printer name is in the first column. Som
      examples are: "prt", "lpr", "lp", or "enscript". The command may be given
      with options, for example "enscript -2 -r" or "lpr -Plpacc170". The
      commands and options on your system may be different from these examples.

Q: Are message headers such as Subject converted?

Yes, message headers properly encoded compliant to RFC 2047 as required to indicate their charset, are properly converted, if possible. But lots of spam is sent without proper enconding. Those characters are assumed to be in assumed-charset and converted to character-set. In some cases it is attempted to guess the charset of such message parts from other parts of the message but attempting to convert an improperly encoded message will never work completely.

Q: What issues are known with specific terminal emulators?

An issue has been reported with the putty emulator: While it was not configured for UTF-8 (must be done somewhere very deep in the menus) starting pine in UTF-8 mode resulted pine receiving an endless series of "c" characters from the terminal and this made the emulator unusable with pine in UTF-8 settings until it was configured to use UTF-8.

Generally, if the terminal's configuration does not match your locale environment or pine's character-set= setting, you'll get some strange-looking characters or your screen may be messed up, but that does not happen if the configuration of pine and the terminal matches, and it's independent of using this patch or not.

Q: Is any charset conversion applied when reading and writing local files?

When you use the Export command, the same conversion is applied as is applied to the message display.

For all reading local text files, e.g. for attaching them to a outgoing message, no conversion is applied and they are assumed use the character encoding to which pine's config setting character-set= is set to.

Q: Who developed the PINE UTF-8 Patch?

Jungshik Shin developed the initial patch with the header conversion, charset and iconv aliases, and generic locale fixes For the message body conversion, he used display-filters.

Bernhard Kaindl updated the patch to newer pine releases and added message body conversion inside pine which speeds up the message viewer.

Jungshik Shin fixed a number of bugs and implemented the new config option called send-charset.

Bob Rasmussen (President of Rasmussen Software) provided an excellent UTF-8 parsing function which allowed to calculate UTF-8 column widths and identify the begin and end of UTF-8 byte sequencies at various places in pine, so his code was extremely helpful.

His company provides Anzio Lite and AnzioWin, Windows-based terminal emulators (telnet, SSH) that can display and print UTF-8.

Eduardo Chappa's patches web site provided example code to learn from.

I, Bernhard Kaindl, thanks them very much for their contributions of code, their help, the permission to use and distribute their code and to submit it to the pine team and especially Jungshik Shin deserves a really big thanks.

Q: What are the next goals?

Fix a remaining, bugs:

Q: What about including it the mainstream Pine?

It's a long answer, because it is so difficult. The short answer is that besides a short mail which is linked below and other similar responses (no more detailed) there has been no other technical responses on submissions and with the shortage on responses one can only try to read between the lines. The best for you may be (if you are not going to engage further into it) would be to just use the patch and maybe report your experiance with it to me and/or to the pine-info list (you can give me permission to forward it without giving the source/email if you see no reason to post yourself). I'm happy to get mail about the patch.

All the people which participated in the creation of this patch expressed their wish that it should be integrated into the stock pine, but several reasons have been given by a Mark Crispin why the makers of pine cannot (or do not want) add support for iconv. The sentence was: "There would also have to be an excellent technical reason offered as to why we should abandon the existing c-client support, which we started and used long before iconv() existed.". That's a misunderstanding, the patch was (at least by me) never intented or designed to abadon c-client support but to give a choice for using a more sophisticated API which was needed to implement UTF-8, a charset which many users seem to need, otherwise it would have not been made the default on the most popular Linux distributions.

I don't know the definition of technical reason, but fullfilling user need with something which is up to the job is one for me.

Not all of the UTF-8 patch is related to iconv. Large parts of the patch are independent of it and simply care of UTF-8 character handling on in-and output and on formatting, also with line wrapping and editing. None of these parts of the patch have been objected so far but also never merged so far. I received no questions and no feedback from pine developers except the comment from Mark Crispin linked above.

Fixing these places (independent how) is the first step in implementing UTF-8 support and some people which get all or most of their non-US-ASCII mail in UTF-8 (or convert it to UTF-8 before delivery to their folders) could even use that alone.

But I received no questions or feedback regarding these issues which reside purely inside pine. I could feed the pine developers slowly and piece by piece with the places where they would need to do something and provide them with the solutions which I have taken to address these places.

c-client could always be improved to provide a better charset conversion API, but for some people (all Linux users) it will be the second-best choice and I have doubts that its implementation will likely never be as good as other specilized and generally used standard conversion libraries. There have been no improvements in this area since the release of pine4.61 where these routines finally got extended to be able to convert UTF-8 into other charsets. But still, it has no single, unified interface for converting any charset to any other charset. That could certainly be done, but the missing piece to do this only appeared long after this could already be done using iconv. The new interface needs to be able to put into use at a single place like iconv is now in this patch so the best different conversion library which is available on the respective platform available can be selected.

What I ask for is an excellent technical reason on why do not allow for more powerful and more simple APIs to be useable from pine in order to allow full support for the current default character set of most Linux distributions. Strangely, in the same mail, Mark Crispin also called GNU iconv being "not free", probably the GNU Library General Public License is not free enough. Its the same license as the GNU C Library is distributed under, the standard Linux C library against which Pine is linked under Linux. Probably he understands being "free" as being compatible with the Pine license, but then he could indeed adapt the c-client to have a better, uniform interface, use that instead of an external iconv library and all his issues would be solved. pine uses SSL and LDAP which are also external packages to provide features which need to be installed separately and where one need to take extra effort to use them and the same could be done for iconv. I is an optional addon. The only plan which could be possible by applying the wishes of the pine developer would be implementation of the everything inside pine, meaning to re-implement a generic, powerful enough API like the iconv API inside pine (using the existing charset translation routines (c-client) of pine). However it's not sure if this would be of interest for the developer(s) since they also rejected many other patches (e.g. maildir support, more rules support - very handy once you need it, fancy threads) for pine to merged. My only understanding of this is that they would have to support all this functionality because pine is the email client of their large University and they probably just don't like to do it.

Background Information


Valid HTML 4.01! - Last Updated 2006-02-14