ALT Linux Sisyphus

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   <title>Pine UTF-8 FAQ</title>
<style type="text/css">
body{font-size:100%;font-family:sans-serif;}
h1 { text-align: center;}
.new {font-size: small;}
.top { text-align: center;}
.report {font-size: small;
         background-color: #eeeeee;
	 color: #113399;
	 }
.pinehelp {font-size: smaller;}
.ack {text-align: center; font-style: italic;}
.cmd {font-weight: bold; font-family: monospace;}
code {font-size:140%; color: brown; background-color: #eeeeee;}
.cmd:first-letter {color: green; background-color: #eee;}
</style>
</head><body><div class="new">Major Update to pine 4.64:<ul><li>
New Feature: Added support for UTF-8 to the prompt editor in pine which
is used for many places in pine, e.g. for password prompts, filename
prompts and search string prompts. It works well, tester feedback of
the first version released to the tester has been great, this version
is included in this patch. I expect no further changes in that corner.
</li><p><li>This set of patches is integrated into a complete rpm package
which is has some patches specifcially for SUSE Linux, for example that the
default directory is not ~/mail but ~/Mail, but you can fix that using
a symbolic link or by not applying this patch, or you can simply patch
your pine using the patch <code>bigpatch.diff</code>, which is the easiyest and
most generic way to build a recent pine with UTF-8 support for Unicode.
It includes everything which the SUSE package includes, except the
SUSE-specific changes.</li>
<li>Latest bug which I fixed: When forwarding a MIME message of type
multipart/alternative, the mail body was not converted to your terminal's
character'set for editing and sending, and the sent mail was
not properly encoded. This was a general bug in pine which was only
really visible if the mail really had to be converted to a different
encoding while forwarding, otherwise nothing was visible and because
this content-type appears to be rather rare, the bug did hide quite well.
</li>

</li></ul></div><h1>PINE
<a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a>
and charset conversion FAQ</h1>
<h3>Q: Where can I get the latest patch?</h3>
<ul>
<li>Everything is below <a
href="http://www.suse.de/~bk/pine/4.64">http://www.suse.de/~bk/pine/4.64</a>
</li><li>
The latest patch is <a href="http://www.suse.de/~bk/pine/4.64/2006-02-23/bigpatch.diff">
http://www.suse.de/~bk/pine/4.64/2006-02-14/bigpatch.diff</a>.<br>It includes a version of 
<a href="http://www.math.washington.edu/~chappa/pine/">
Eduardo Chappa's Patches for Pine</a>.<br>
It's nearly up-to-date with his latest patch collection.<br>
It's an updated and improved version of was part of SUSE Linux 10.0 and this patch is going
to be shipped with SUSE Linux 10.1
</li><li>An <a href="http://www.suse.de/~bk/pine/4.64/2006-02-14/10.0-i386/">
RPM package</a> for users of <a href="http://opensuse.org">OpenSUSE 10.0</a>
and
<a href="http://www.novell.com/products/suselinux/">
SUSE Linux 10.0 (i386)</a> also exists.
</li></ul><h3>Q: What are the prerequisites for using this patch?</h3><ol><li>
You need to have a recent POSIX-compliant operating system which either has
built-in support of the <href="http://en.wikipedia.org/wiki/iconv">POSIX iconv API</a>
or you have to install an iconv implementation. GNU libiconv works on on
Linux and Windows at least. However if not running Linux, you will need
to make some adjustments to link to your iconv library, but I can help
with that and I would like to get mail (at bk on suse.de) from you when you tried
to install on a different operating system so that I can add the changes to the
patch and mention it here.
</li><li>
If your iconv library is not integrated into your standard C library on
your system (may also happen with Linux distributions which use a separate
libiconv), you have to call the pine build command with EXTRALDFLAGS=-liconv.
</li><li>
On Linux, you need to have the <u>glibc locale package</u> installed, otherwise your
iconv will not have the neccessary data and routines for actually converting
between all the charsets. Normally it's installed by default, and needed for
all other locale support too, but if something does not work, that's where to look
at first. You can test it by testing the iconv command which is usually also
installed and which uses libiconv to convert between character encodings.
</li></ol></li></ul><h3>Q: What are the effects of this patch?</h3><ol><li>
The "Terminal garbled" problem which may happen when receiving a mail which
cannot be displayed on your terminal (e.g. some spam) should be gone, as
with the patch it is tried to only output characters which your terminal
could actually display (if the neccerary fonts are available to it),
especially if your terminal is in UTF-8 mode, it is definitely gone.
</li><p><li>
It allows you to convert mail using character encodings which pine does
not support, it's set of supported character sets is limited and it allows
to have improved charset conversion as it is suppied thru specialized
charset conversion libraries. Right now, the <a
href="http://en.wikipedia.org/wiki/iconv">iconv API</a> is used for that,
but more conversion engines could be added if needed on certain platforms.
Some implementations of iconv (like the iconv implementation on Linux)
have support for transliterating charaters if no equivalent character
is available in the destination charset.<p>
This can impove the display and export of mail in your specific environment
even if do not use any Unicode in your environment.
</li><p><li>
It allows you to use a
<a href="http://en.wikipedia.org/wiki/Terminal_emulator">Terminal emulator</a>
which is using
<a href="http://en.wikipedia.org/wiki/Unicode">Unicode</a> (encoded as UTF-8)
as the display terminal for pine, thereby giving you the ability to view
and work on correctly encoded mail from anywhere in the world - under the
condition that the mails which you receive are proplery standards-conformant
and your terminal used a character set setup and font in which these characters
can be displayed.
<p>Example: Using enhanced terminals like the mlterm for X11, you can
even use Hebrew in right-to-left mode and quickly switch to left-to-right
display of other, completely different languages without restarting pine.<p>
So this patch allows you to switch your display terminal to the UTF-8
charset and it allows pine translate all the character encodings used
around the word to your display. Even if you can't read all languages,
you are at least able to view the mail correctly. This also applies
to e.g. curly apostrophes and the like which are often used in Windows
charsets and which have no good corresponding characters e.g. in
ISO-8859-1. But with UTF-8, such symbols will be shown correctly.
</li>
</ol>

<h3>Q: Viewing mails works for some mails but not for others, why?</h3>
<p>
There are several reasons why this can happen:</p>
<ul>
<li>
The mail doesn't contian an
<a href="http://en.wikipedia.org/wiki/Request for Comments">RFC</a>-compliant charset
in the part of the mail which you are viewing, and the result is that
pine has no information about which conversion needs to be done.<br>
If this happens, you have to tell pine which charset it should assume
then.<br>The charset which is most useful for this is usually the encoding
which you receive the most mail. For Western Europe, <code>windows-1252</code>
would be likely the most useful value, since it is compatible wit the other
charset in this region (iso-8859-1) but also contains some characters used
by Windows. In other regions other encodings would be useful. The config
setting to use is <code>assumed-charset</code>.<p>
This often only works for the mail headers (Subject, From, To, Cc, Attachments)
and very often not for the mail body. It is recommended to use a special
charset-alias to alias from <code>US-ASCII</code> to your assumed charset
as explained in the following paragraph:
</li><li>
The mail possibly contains an
<a href="http://en.wikipedia.org/wiki/Request for Comments">RFC</a>-compliant charset tag
but it specifies a wrong or unknown charset value.<p>
To view such messages, you tell pine to assume that a specific charset given
in mails actually means a different charset. This is then applied every time
when a mail which such charset label is read.
For example, you try to use an <code>charset-aliases</code> setting like this (but usually
you'll only need one alias):</p></li><div align=center><table><tr><td><pre>
charset-aliases = US-ASCII:windows-1252
                  iso-8859-1:windows-1252
                  iso-8859-2:windows-1252
                  iso-8859-12:windows-1250
                  iso-8859-15:windows-1252
                  iso-8859-11:windows-874
                  tis-620:windows-874
                  gb2312:gb18030
                  gbk:gb18030
                  euc-cn:gb18030
                  ks_c_5601-1987:x-windows-949
                  5601:x-windows-949
iconv-aliases   = x-windows-949:mscp949
                  euc-kr:mscp949
</pre></td></tr></table></div>
<code>iconv-aliases</code> is another level of translation which is applied after
charset-aliases - this allows you e.g. to first translate a series
of charset tags to a new charset tag and and then you have a single
entry in the iconv-aliases setting where you can quickly change the
charset tag to which they are all finally translated.</ul></p>

<h3>What is the status of UTF-8 support in the stock pine?</h3>
<p>
Pine as provided from the University of Washington only uses it's internal
code tables for character set conversion which limits the number of charsets
supported to about a dozen or so. With pine4.61, UTF-8 has been added as
a character-set from which pine can recode from, but it can't recode from
other charsets to UTF-8 for displaying mail or exporting to file/pinter.
<p>
If you want to see some support for UTF-8 in pine, ask the pine maintainers
on the pine-info list to improve support for it (something can be even
done whithout the iconv function which one of the pine developers, possibly
the main developer, objects) to show them the demand. Even little progress
would be good.

<h3>Q: Does composing messages work also on UTF-8 terminals?</h3>
<p>
Yes. you can use pine's default editor pico (which is built into pine
and perfectly integrated) which is now working with UTF-8 as well or
if you want to use a different editor, you can define one using the
pine configuration option <code>alternate-editor</code>.<br>
<p>
There are several editors for Unix which support UTF-8 as well.
For instance, <a href="http://www.vim.org">Vim 6.x</a> is an excellent
text editor with solid UTF-8 support. Emacs also supports UTF-8 and has
interfaces to CJK input methods.<p>Mike Fabian has a page about
<a href="http://www.suse.de/~mfabian/suse-cjk/emacs-and-xemacs.html">
UTF-8 Internationalisation in GNU Emacs and XEmacs</a> in his <a
href="http://www.suse.de/~mfabian/">
document on CJK(Chinese, Japanese, Korean) Support in SUSE Linux</a>.
Emacs, using <code>emacs -nw</code> as alternate editor will work.
<p>
<h3>Q: Is the charset conversion applied during printing?</h3>
<p>
Yes, translation is done to the character-set of the display device
configured for pine.

<div class="report">
Report from a tester:
<p>
I have had time to test printing, and it works quite well! character
set conversion to UTF-8 happens. This is good in my case -- I am doing
"attached print", and my emulator handles the UTF-8 codes in the print
stream.
<p>
However, I can imagine that it would be a problem for other people who
were writing directly to a printer. There are not many (any?) UTF-8
printers around. If the user had a printer that worked in Big5 (Chinese),
and they usually got Big5 emails and printed them, then installing your
patch would break things. I think the best solution is to create an
additional configuration variable for "printer character set"; or maybe a
flag associated with each printer definition.
<p>
(I do one other change for printing. In cmd_print, where it calls
format_message, I add a flag FM_NOWRAP, so it doesn't break my lines based
on the number of screen columns.)
</div>

<p>
If not printing to an printer attached to the terminal, it should be
possible to use a character set filter as personal print command in
Setup/Printers. For instance, you can use
Juliusz Chroboczeck's <a href="http://www.pps.jussieu.fr/~jch/software/cedilla/">cedilla</a> has a good WGL4 coverage sufficient for
Latin, Greek and Cyrillic text.  Another good printer filter is
uniprint included in Gapar Sinai's <a href="http://www.yudit.org">Yudit</a>.

<pre class="pinehelp">
 Personally selected print command
      The text to be printed will be piped into the command given here. The
      command is in the 2nd column, the printer name is in the first column. Som
      examples are: "prt", "lpr", "lp", or "enscript". The command may be given
      with options, for example "enscript -2 -r" or "lpr -Plpacc170". The
      commands and options on your system may be different from these examples.
</pre>

<h3>Q: Are message headers such as Subject converted?</h3>
<p>
Yes, message headers properly encoded compliant to
<a href="http://www.faqs.org/rfcs/rfc2047.html">RFC 2047</a> as required
to indicate their charset, are properly converted, if possible.
But lots of spam is sent without proper enconding.
Those characters are assumed to be in <code>assumed-charset</code> and
converted to <code>character-set</code>.
In some cases it is attempted to guess the charset of such message parts
from other parts of the message but attempting to convert an improperly
encoded message will never work completely.

<h3>Q: What issues are known with specific terminal emulators?</h3>
<p>
An issue has been reported with the putty emulator: While it was not configured
for UTF-8 (must be done somewhere very deep in the menus) starting pine in
UTF-8 mode resulted pine receiving an endless series of "c" characters from
the terminal and this made the emulator unusable with pine in UTF-8 settings
until it was configured to use UTF-8.<p>Generally, if the terminal's configuration
does not match your locale environment or pine's <code>character-set=</code>
setting, you'll get some strange-looking characters or your screen may be
messed up, but that does not happen if the configuration of pine and the
terminal matches, and it's independent of using this patch or not.
</p>
<h3>Q: Is any charset conversion applied when reading and
writing local files?</h3>
<p>
When you use the <span class="cmd">Export</span> command, the same conversion
is applied as is applied to the message display.
<p>
For all reading local text files, e.g. for attaching them to a
outgoing message, no conversion is applied and they are assumed
use the character encoding to which pine's config setting
<code>character-set=</code> is set to.

<h3>Q: Who developed the PINE UTF-8 Patch?</h3>
<p>
Jungshik Shin developed <a href="http://mail.nl.linux.org/linux-utf8/2002-07/msg00012.html">the initial patch with the header conversion, charset
and iconv aliases, and generic locale fixes</a>
For the message body conversion, he used display-filters.
<p>
Bernhard Kaindl updated the patch to newer pine releases
and added message body conversion inside pine which speeds up the message viewer.
<p>
Jungshik Shin fixed a number of bugs and implemented the
new config option called <code>send-charset</code>.
<p>
Bob Rasmussen (President of <a href="http://anzio.com/">Rasmussen Software</a>) provided an excellent UTF-8 parsing function which allowed
to calculate UTF-8 column widths and identify the begin and end of UTF-8
byte sequencies at various places in pine, so his code was extremely helpful.<P>
His company provides Anzio Lite and AnzioWin,
<a href="http://anzio.com/">Windows-based terminal emulators</a>
(telnet, SSH) that can display and print UTF-8.
<p>
Eduardo Chappa's <a href="http://www.math.washington.edu/~chappa/pine/">
patches web site</a> provided example code to learn from.<p>
I, Bernhard Kaindl, thanks them very much for their contributions
of code, their help, the permission to use and distribute their code
and to submit it to the pine team and especially Jungshik Shin
deserves a really big thanks.
<h3>Q: What are the next goals?</h3>
<p>Fix a remaining, bugs:</p>
<ul>
<li>The message header editor wraps long lines into multible lines
but the function which does this(FormatLines) does not know where
it may split the line and where not.
This may be also fixed in the course of reimplementing the Unicode
support in the composer in the same way as it's done in pico and
in the status line prompt function, but that may not even be
necessary. However there have been no bug reports in this direction
at all since it's not really a big problem.</li>
<li>Automatically turn off the receive side of the iso-2022-jp-conversion
hack which is standard feature in pine when iconv is used because iconv
does the same but without a special hack for reading iso-2022-jp.
and doing the conversion twice destroys the receiving of iso-2022-jp.
The workaround for it is to check the checkbox for
<code>disable-2022-jp-conversions</code> in the pine config.

</li>
</ul>
<h3>Q: What about including it the mainstream Pine?</h3>
<p>It's a long answer, because it is so difficult. The short answer
is that besides a short mail which is linked below and other
similar responses (no more detailed) there has been no other
technical responses on submissions and with the shortage on
responses one can only try to read between the lines. The best
for you may be (if you are not going to engage further into it)
would be to just use the patch and maybe report your experiance with it
to me and/or to the pine-info list (you can give me permission
to forward it without giving the source/email if you see no
reason to post yourself). I'm happy to get mail about the patch.
<p>
All the people which participated in the creation of this patch
expressed their wish that it should be integrated into the stock
pine, but several <a
href="http://mailman1.u.washington.edu/pipermail/pine-info/2004-August/040541.html">reasons</a>
have been given by a Mark Crispin why the makers of pine cannot
(or do not want) add support for iconv. The sentence was:
<i>"There would also have to be an excellent technical reason offered as
to why we should abandon the existing c-client support, which we
started and used long before iconv() existed."</i>. That's a
misunderstanding, the patch was (at least by me) never intented
or designed to abadon c-client support but to give a choice for
using a more sophisticated API which was needed to implement UTF-8,
a charset which many users seem to need, otherwise it would have
not been made the default on the most popular Linux distributions.
<p>
I don't know the definition of technical reason, but fullfilling
user need with something which is up to the job is one for me.
<p>
Not all of the UTF-8 patch is related to iconv. Large parts of the
patch are independent of it and simply care of UTF-8 character handling
on in-and output and on formatting, also with line wrapping and editing.
None of these parts of the patch have been objected so far but also
never merged so far. I received no questions and no feedback from pine
developers except the comment from Mark Crispin linked above.<p>
Fixing these places (independent how) is the first step in implementing
UTF-8 support and some people which get all or most of their non-US-ASCII
mail in UTF-8 (or convert it to UTF-8 before delivery to their folders)
could even use that alone.<P>
But I received no questions or feedback regarding these issues which
reside purely inside pine.
I could feed the pine developers slowly and piece by piece with the
places where they would need to do something and provide them with
the solutions which I have taken to address these places.
<p>
c-client could always be improved to provide a better charset conversion API,
but for some people (all Linux users) it will be the second-best choice
and I have doubts that its implementation will likely never be
as good as other specilized and generally used standard conversion libraries.
There have been no improvements in this area since the release of pine4.61
where these routines finally got extended to be able to convert UTF-8
into other charsets. But still, it has no single, unified interface for
converting any charset to any other charset. That could certainly
be done, but the missing piece to do this only appeared long after
this could already be done using iconv. The new interface needs to
be able to put into use at a single place like iconv is now in this
patch so the best different conversion library which is available
on the respective platform available can be selected.<p>
What I ask for is an excellent technical reason on why
do not allow for more powerful and more simple APIs to be useable
from pine in order to allow full support for the current default
character set of most Linux distributions. Strangely, in the same
mail, Mark Crispin also called GNU iconv being "not free", probably
the <a href="http://en.wikipedia.org/wiki/LGPL">GNU Library General
Public License</a> is not free enough. Its the same license as the
<a href="http://en.wikipedia.org/wiki/GNU C Library">GNU C Library</a>
is distributed under, the standard Linux C library against
which Pine is linked under Linux. Probably he understands being "free"
as being compatible with the Pine license, but then he could indeed
adapt the c-client to have a better, uniform interface, use that
instead of an external iconv library and all his issues would be
solved.

pine uses SSL and LDAP which are also external packages to provide
features which need to be installed separately and where one
need to take extra effort to use them and the same could be
done for iconv. I is an optional addon.

The only plan which could be possible by applying the wishes of the
pine developer would be implementation of the everything inside pine,
meaning to re-implement a generic, powerful enough API like the iconv
API inside pine (using the existing charset translation routines (c-client)
of pine). However it's not sure if this would be of interest for the
developer(s) since they also rejected many other patches (e.g. maildir
support, more rules support - very handy once you need it, fancy threads)
for pine to merged. My only understanding of this is that they would have
to support all this functionality because pine is the email client of their
large University and they probably just don't like to do it.<br>
</p>
</li>
</ul>
<h3>Background Information</h3>
<ul>
<li><a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">
UTF-8 and Unicode FAQ for Unix/Linux by Markus Kuhn</a></li><li>
The definition of UTF-8(UCS/Unicode Transformation Format 8) is found in
<a href="http://www.unicode.org">Unicode</a> and
<a href="http://anubis.dkuug.dk/JTC1/SC2/WG2/">ISO 10646</a>.</li>
<li><a href="http://eyegene.ophthy.med.umich.edu/unicode/">
A Quick Primer On Unicode and Software Internationalization
Under Linux and UNIX</a> is an excellent resource (with many screenshots)
if you are looking for UTF-8 terminal emulators, editors, conversion
and printing utilities, and fonts!</li></ul><p>
<hr><p><a href="http://validator.w3.org/check?uri=referer"><img border="0"
src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"
height="31" width="88"></a> - Last Updated 2006-02-14</p> </body></html>

5.0:	4.64L-alt5.1
4.1:	4.64L-alt5
4.0:	4.64L-alt4.1
3.0:	4.58L-alt4
+backports:	4.64L-alt0.M30.4