Part of bzrlib
Class | Serializer_v8 | This serialiser adds rich roots. |
Function | _unescaper | Undocumented |
Function | _unescape_xml | Unescape predefined XML entities in a string of data. |
Function | _ensure_utf8_re | Make sure the _utf8_re and _unicode_re regexes have been compiled. |
Function | _unicode_escape_replace | Replace a string of non-ascii, non XML safe characters with their escape |
Function | _utf8_escape_replace | Escape utf8 characters into XML safe ones. |
Function | _encode_and_escape | Encode the string into utf8, and escape invalid XML characters |
Function | _get_utf8_or_ascii | Return a cached version of the string. |
Function | _clear_cache | Clean out the unicode => escaped map |
This will escape both Standard XML escapes, like <>"', etc. As well as escaping non ascii characters, because ElementTree did. This helps us remain compatible to older versions of bzr. We may change our policy in the future, though.
This uses 2 tricks. It is either escaping "standard" characters, like "&<>, or it is handling characters with the high-bit set. For ascii characters, we just lookup the replacement in the dictionary. For everything else, we decode back into Unicode, and then use the XML escape code.
cElementTree will return a plain string if the XML is plain ascii. It only returns Unicode when it needs to. We want to work in utf-8 strings. So if cElementTree returns a plain string, we can just return the cached version. If it is Unicode, then we need to encode it.
Parameters | a_str | An 8-bit string or Unicode as returned by cElementTree.Element.get() |
Returns | A utf-8 encoded 8-bit string. |