Part of bzrlib.utextwrap View In Hierarchy
Extend TextWrapper for Unicode.
This textwrapper handles east asian double width and split word
even if !break_long_words when word contains double width
characters.
:param ambiguous_width: (keyword argument) width for character when
unicodedata.east_asian_width(c) == 'A'
(default: 1)
Limitations:
* expand_tabs doesn't fixed. It uses len() for calculating width
of string on left of TAB.
* Handles one codeunit as a single character having 1 or 2 width.
This is not correct when there are surrogate pairs, combined
characters or zero-width characters.
* Treats all asian character are line breakable. But it is not
true because line breaking is prohibited around some characters.
(For example, breaking before punctation mark is prohibited.)
See UAX # 14 "UNICODE LINE BREAKING ALGORITHM"
| Method | __init__ | Undocumented |
| Method | wrap | Undocumented |
| Method | _unicode_char_width | Return width of character uc. |
| Method | _width | Returns width for s. |
| Method | _cut | Returns head and rest of s. (head+rest == s) |
| Method | _fix_sentence_endings | _fix_sentence_endings(chunks : [string]) |
| Method | _handle_long_word | Undocumented |
| Method | _wrap_chunks | Undocumented |
| Method | _split | Undocumented |
uc.| Parameters | uc Single unicode character. | |
When s is unicode, take care of east asian width. When s is bytes, treat all byte is single width character.
Head is large as long as _width(head) <= width.
_fix_sentence_endings(chunks : [string])
Correct for sentence endings buried in 'chunks'. Eg. when the
original text contains "... foo.
Bar ...", munge_whitespace()
and split() will convert that to [..., "foo.", " ", "Bar", ...]
which has one too few spaces; this method simply changes the one
space to two.
Note: This function is copied from textwrap.TextWrap and modified
to use unicode always.