Revised Tk Text Widget

Additional Wrap Mode 'codepoint'

Command-Line Name: -wrap
Database Name: wrap
Database Class: Wrap: Specifies how to handle lines in the text that are too long to be displayed in a single line of the text's window. The value must be none, char, word, or codepoint. A wrap mode of none means that each line of text appears as exactly one line on the screen; extra characters that do not fit on the screen are not displayed. In the other modes each line of text will be broken up into several screen lines if necessary to keep all the characters visible. In char mode a screen line break may occur after any character; in word mode a line break will only be made at word boundaries. The elaborated mode codepoint is based on Unicode code points, and conforms exactly (with libunibreak) or mostly (with internal algorithm) to the recommendations of the Unicode consortium. If option -useunibreak is set, then the unibreak library (from Wu Yongwei) will be used for the line break computation. This library even supports language dependent line breaking (see option -lang for language support). Otherwise, if -useunibreak is not set, or if the unibreak library is not available, a simpler algorithm – also based on the recommendations of the Unicode consortium, but restricted to Latin-1 and most non-language dependent characters, without regarding combined marks, and without language support – will be used for the computation of the breaks.

Command-Line Name: -useunibreak

Database Name: useUniBreak

Database Class: UseUniBreak

If this option is enabled then wrap mode codepoint (see option -wrap) will use the external library libunibreak (from Wu Yongwei) for the computation of line breaks, but only if this library is available (currently only UNIX). This library even supports language dependent line breaking. Otherwise, if this option is not enabled, or if libunibreak is not available, a simpler algorithm – also based on the recommendations of the Unicode consortium, but restricted to Latin-1 and most non-language dependent characters, without regarding combined marks, and without language support – will be used for the computation of the breaks. Per default this option is not enabled.

For a simple test whether the unibreak libary is available see command brks.

The word mode is very primitive, and gives sometimes ugly results. For example it cannot even break two words when connected with a hyphen, because the meaning of a hyphen is contextual. Thus I've added the codepoint mode. The fast internal algorithm gives quite good results, the unibreak library, normally pre-installed under UNIX, should give very good results. With the use of the library even language support will be provided - namely for zh (Chinese), ko (Korean), and ja (Japanese).

Last not least the user may like to know the results of the line break algorithm. At the first glance this functionality might not be very useful, but it's primary purpose is that the user can check whether libunibreak is available on his system. Moreover this command can be used to proof the quality of the internal algorithm.

pathName brks string ?lang?

This command expects a string of characters, and the result is a list of integers. This list has exactly the character length of the given string, and integer value 1 in this list denotes a potential line break point at this character position, value 2 denotes a mandatory line break point, and value 0 (zero) denotes no line break at this character position. If the optional language code is specified (see widget option -lang for language codes), then the external library libunibreak (from Wu Yongwei) will be used for the line break computation; an error will be thrown if this library is not available. If no language code is specified, then the internal wrap algorithm will be used: also based on the recommendations of the Unicode consortium, but restricted to Latin-1 and most non-language dependent characters, without regarding combined marks, and without language support.

This command has three purposes:

It provides a simple check whether libunibreak is available or not (even an empty string, and any language code, e.g. "xx", works for this check). An error will be thrown if libunibreak is not available, but a language code has been specified (-lang).
For a proof of the line break algorithm, especially for testing the internal algorithm.
It is also conceivable to use this function for the computation of line breaks inside other widgets.

A note about the popular ICU library. In file tkTextLineBreak.c I've included the following commentary:

* The alternative is the use of ICU library (http://site.icu-project.org/),
* instead of libunibreak, but this would require to support a very
* complex interface of a dynamic load library, with other words, we
* would need dozens of functions pointers. This is not really a drawback,
* and probably the ICU library is the better choice, but I think that a
* change to the ICU library is reasonable only if the Tcl/Tk developer team
* is deciding to use this library also for complete Unicode support (character
* conversion, for instance).