Part of bzrlib.utextwrap View In Hierarchy
Extend TextWrapper for Unicode. This textwrapper handles east asian double width and split word even if !break_long_words when word contains double width characters. :param ambiguous_width: (keyword argument) width for character when unicodedata.east_asian_width(c) == 'A' (default: 1) Limitations: * expand_tabs doesn't fixed. It uses len() for calculating width of string on left of TAB. * Handles one codeunit as a single character having 1 or 2 width. This is not correct when there are surrogate pairs, combined characters or zero-width characters. * Treats all asian character are line breakable. But it is not true because line breaking is prohibited around some characters. (For example, breaking before punctation mark is prohibited.) See UAX # 14 "UNICODE LINE BREAKING ALGORITHM"
Method | __init__ | Undocumented |
Method | wrap | Undocumented |
Method | _unicode_char_width | Return width of character uc . |
Method | _width | Returns width for s. |
Method | _cut | Returns head and rest of s. (head+rest == s) |
Method | _fix_sentence_endings | _fix_sentence_endings(chunks : [string]) |
Method | _handle_long_word | Undocumented |
Method | _wrap_chunks | Undocumented |
Method | _split | Undocumented |
uc
.Parameters | uc Single unicode character. |
When s is unicode, take care of east asian width. When s is bytes, treat all byte is single width character.
Head is large as long as _width(head) <= width.
_fix_sentence_endings(chunks : [string]) Correct for sentence endings buried in 'chunks'. Eg. when the original text contains "... foo. Bar ...", munge_whitespace() and split() will convert that to [..., "foo.", " ", "Bar", ...] which has one too few spaces; this method simply changes the one space to two. Note: This function is copied from textwrap.TextWrap and modified to use unicode always.