bigpatch.diff
, which is the easiyest and
most generic way to build a recent pine with UTF-8 support for Unicode.
It includes everything which the SUSE package includes, except the
SUSE-specific changes.This can impove the display and export of mail in your specific environment even if do not use any Unicode in your environment.
Example: Using enhanced terminals like the mlterm for X11, you can even use Hebrew in right-to-left mode and quickly switch to left-to-right display of other, completely different languages without restarting pine.
So this patch allows you to switch your display terminal to the UTF-8 charset and it allows pine translate all the character encodings used around the word to your display. Even if you can't read all languages, you are at least able to view the mail correctly. This also applies to e.g. curly apostrophes and the like which are often used in Windows charsets and which have no good corresponding characters e.g. in ISO-8859-1. But with UTF-8, such symbols will be shown correctly.
There are several reasons why this can happen:
windows-1252
would be likely the most useful value, since it is compatible wit the other
charset in this region (iso-8859-1) but also contains some characters used
by Windows. In other regions other encodings would be useful. The config
setting to use is assumed-charset
.
This often only works for the mail headers (Subject, From, To, Cc, Attachments)
and very often not for the mail body. It is recommended to use a special
charset-alias to alias from US-ASCII
to your assumed charset
as explained in the following paragraph:
To view such messages, you tell pine to assume that a specific charset given
in mails actually means a different charset. This is then applied every time
when a mail which such charset label is read.
For example, you try to use an charset-aliases
setting like this (but usually
you'll only need one alias):
charset-aliases = US-ASCII:windows-1252 iso-8859-1:windows-1252 iso-8859-2:windows-1252 iso-8859-12:windows-1250 iso-8859-15:windows-1252 iso-8859-11:windows-874 tis-620:windows-874 gb2312:gb18030 gbk:gb18030 euc-cn:gb18030 ks_c_5601-1987:x-windows-949 5601:x-windows-949 iconv-aliases = x-windows-949:mscp949 euc-kr:mscp949 |
iconv-aliases
is another level of translation which is applied after
charset-aliases - this allows you e.g. to first translate a series
of charset tags to a new charset tag and and then you have a single
entry in the iconv-aliases setting where you can quickly change the
charset tag to which they are all finally translated.Pine as provided from the University of Washington only uses it's internal code tables for character set conversion which limits the number of charsets supported to about a dozen or so. With pine4.61, UTF-8 has been added as a character-set from which pine can recode from, but it can't recode from other charsets to UTF-8 for displaying mail or exporting to file/pinter.
If you want to see some support for UTF-8 in pine, ask the pine maintainers on the pine-info list to improve support for it (something can be even done whithout the iconv function which one of the pine developers, possibly the main developer, objects) to show them the demand. Even little progress would be good.
Yes. you can use pine's default editor pico (which is built into pine
and perfectly integrated) which is now working with UTF-8 as well or
if you want to use a different editor, you can define one using the
pine configuration option alternate-editor
.
There are several editors for Unix which support UTF-8 as well. For instance, Vim 6.x is an excellent text editor with solid UTF-8 support. Emacs also supports UTF-8 and has interfaces to CJK input methods.
Mike Fabian has a page about
UTF-8 Internationalisation in GNU Emacs and XEmacs in his
document on CJK(Chinese, Japanese, Korean) Support in SUSE Linux.
Emacs, using emacs -nw
as alternate editor will work.
Yes, translation is done to the character-set of the display device configured for pine.
I have had time to test printing, and it works quite well! character set conversion to UTF-8 happens. This is good in my case -- I am doing "attached print", and my emulator handles the UTF-8 codes in the print stream.
However, I can imagine that it would be a problem for other people who were writing directly to a printer. There are not many (any?) UTF-8 printers around. If the user had a printer that worked in Big5 (Chinese), and they usually got Big5 emails and printed them, then installing your patch would break things. I think the best solution is to create an additional configuration variable for "printer character set"; or maybe a flag associated with each printer definition.
(I do one other change for printing. In cmd_print, where it calls format_message, I add a flag FM_NOWRAP, so it doesn't break my lines based on the number of screen columns.)
If not printing to an printer attached to the terminal, it should be possible to use a character set filter as personal print command in Setup/Printers. For instance, you can use Juliusz Chroboczeck's cedilla has a good WGL4 coverage sufficient for Latin, Greek and Cyrillic text. Another good printer filter is uniprint included in Gapar Sinai's Yudit.
Personally selected print command The text to be printed will be piped into the command given here. The command is in the 2nd column, the printer name is in the first column. Som examples are: "prt", "lpr", "lp", or "enscript". The command may be given with options, for example "enscript -2 -r" or "lpr -Plpacc170". The commands and options on your system may be different from these examples.
Yes, message headers properly encoded compliant to
RFC 2047 as required
to indicate their charset, are properly converted, if possible.
But lots of spam is sent without proper enconding.
Those characters are assumed to be in assumed-charset
and
converted to character-set
.
In some cases it is attempted to guess the charset of such message parts
from other parts of the message but attempting to convert an improperly
encoded message will never work completely.
An issue has been reported with the putty emulator: While it was not configured for UTF-8 (must be done somewhere very deep in the menus) starting pine in UTF-8 mode resulted pine receiving an endless series of "c" characters from the terminal and this made the emulator unusable with pine in UTF-8 settings until it was configured to use UTF-8.
Generally, if the terminal's configuration
does not match your locale environment or pine's character-set=
setting, you'll get some strange-looking characters or your screen may be
messed up, but that does not happen if the configuration of pine and the
terminal matches, and it's independent of using this patch or not.
When you use the Export command, the same conversion is applied as is applied to the message display.
For all reading local text files, e.g. for attaching them to a
outgoing message, no conversion is applied and they are assumed
use the character encoding to which pine's config setting
character-set=
is set to.
Jungshik Shin developed the initial patch with the header conversion, charset and iconv aliases, and generic locale fixes For the message body conversion, he used display-filters.
Bernhard Kaindl updated the patch to newer pine releases and added message body conversion inside pine which speeds up the message viewer.
Jungshik Shin fixed a number of bugs and implemented the
new config option called send-charset
.
Bob Rasmussen (President of Rasmussen Software) provided an excellent UTF-8 parsing function which allowed to calculate UTF-8 column widths and identify the begin and end of UTF-8 byte sequencies at various places in pine, so his code was extremely helpful.
His company provides Anzio Lite and AnzioWin, Windows-based terminal emulators (telnet, SSH) that can display and print UTF-8.
Eduardo Chappa's patches web site provided example code to learn from.
I, Bernhard Kaindl, thanks them very much for their contributions of code, their help, the permission to use and distribute their code and to submit it to the pine team and especially Jungshik Shin deserves a really big thanks.
Fix a remaining, bugs:
disable-2022-jp-conversions
in the pine config.
It's a long answer, because it is so difficult. The short answer is that besides a short mail which is linked below and other similar responses (no more detailed) there has been no other technical responses on submissions and with the shortage on responses one can only try to read between the lines. The best for you may be (if you are not going to engage further into it) would be to just use the patch and maybe report your experiance with it to me and/or to the pine-info list (you can give me permission to forward it without giving the source/email if you see no reason to post yourself). I'm happy to get mail about the patch.
All the people which participated in the creation of this patch expressed their wish that it should be integrated into the stock pine, but several reasons have been given by a Mark Crispin why the makers of pine cannot (or do not want) add support for iconv. The sentence was: "There would also have to be an excellent technical reason offered as to why we should abandon the existing c-client support, which we started and used long before iconv() existed.". That's a misunderstanding, the patch was (at least by me) never intented or designed to abadon c-client support but to give a choice for using a more sophisticated API which was needed to implement UTF-8, a charset which many users seem to need, otherwise it would have not been made the default on the most popular Linux distributions.
I don't know the definition of technical reason, but fullfilling user need with something which is up to the job is one for me.
Not all of the UTF-8 patch is related to iconv. Large parts of the patch are independent of it and simply care of UTF-8 character handling on in-and output and on formatting, also with line wrapping and editing. None of these parts of the patch have been objected so far but also never merged so far. I received no questions and no feedback from pine developers except the comment from Mark Crispin linked above.
Fixing these places (independent how) is the first step in implementing UTF-8 support and some people which get all or most of their non-US-ASCII mail in UTF-8 (or convert it to UTF-8 before delivery to their folders) could even use that alone.
But I received no questions or feedback regarding these issues which reside purely inside pine. I could feed the pine developers slowly and piece by piece with the places where they would need to do something and provide them with the solutions which I have taken to address these places.
c-client could always be improved to provide a better charset conversion API, but for some people (all Linux users) it will be the second-best choice and I have doubts that its implementation will likely never be as good as other specilized and generally used standard conversion libraries. There have been no improvements in this area since the release of pine4.61 where these routines finally got extended to be able to convert UTF-8 into other charsets. But still, it has no single, unified interface for converting any charset to any other charset. That could certainly be done, but the missing piece to do this only appeared long after this could already be done using iconv. The new interface needs to be able to put into use at a single place like iconv is now in this patch so the best different conversion library which is available on the respective platform available can be selected.
What I ask for is an excellent technical reason on why
do not allow for more powerful and more simple APIs to be useable
from pine in order to allow full support for the current default
character set of most Linux distributions. Strangely, in the same
mail, Mark Crispin also called GNU iconv being "not free", probably
the GNU Library General
Public License is not free enough. Its the same license as the
GNU C Library
is distributed under, the standard Linux C library against
which Pine is linked under Linux. Probably he understands being "free"
as being compatible with the Pine license, but then he could indeed
adapt the c-client to have a better, uniform interface, use that
instead of an external iconv library and all his issues would be
solved.
pine uses SSL and LDAP which are also external packages to provide
features which need to be installed separately and where one
need to take extra effort to use them and the same could be
done for iconv. I is an optional addon.
The only plan which could be possible by applying the wishes of the
pine developer would be implementation of the everything inside pine,
meaning to re-implement a generic, powerful enough API like the iconv
API inside pine (using the existing charset translation routines (c-client)
of pine). However it's not sure if this would be of interest for the
developer(s) since they also rejected many other patches (e.g. maildir
support, more rules support - very handy once you need it, fancy threads)
for pine to merged. My only understanding of this is that they would have
to support all this functionality because pine is the email client of their
large University and they probably just don't like to do it.