b.xml8 : module documentation

Part of bzrlib

No module docstring
Class Serializer_v8 This serialiser adds rich roots.
Function _unescaper Undocumented
Function _unescape_xml Unescape predefined XML entities in a string of data.
Function _ensure_utf8_re Make sure the _utf8_re and _unicode_re regexes have been compiled.
Function _unicode_escape_replace Replace a string of non-ascii, non XML safe characters with their escape
Function _utf8_escape_replace Escape utf8 characters into XML safe ones.
Function _encode_and_escape Encode the string into utf8, and escape invalid XML characters
Function _get_utf8_or_ascii Return a cached version of the string.
Function _clear_cache Clean out the unicode => escaped map
def _unescaper(match, _map=_xml_unescape_map):
Undocumented
def _unescape_xml(data):
Unescape predefined XML entities in a string of data.
def _ensure_utf8_re():
Make sure the _utf8_re and _unicode_re regexes have been compiled.
def _unicode_escape_replace(match, _map=_xml_escape_map):
Replace a string of non-ascii, non XML safe characters with their escape

This will escape both Standard XML escapes, like <>"', etc. As well as escaping non ascii characters, because ElementTree did. This helps us remain compatible to older versions of bzr. We may change our policy in the future, though.

def _utf8_escape_replace(match, _map=_xml_escape_map):
Escape utf8 characters into XML safe ones.

This uses 2 tricks. It is either escaping "standard" characters, like "&<>, or it is handling characters with the high-bit set. For ascii characters, we just lookup the replacement in the dictionary. For everything else, we decode back into Unicode, and then use the XML escape code.

def _encode_and_escape(unicode_or_utf8_str, _map=_to_escaped_map):
Encode the string into utf8, and escape invalid XML characters
def _get_utf8_or_ascii(a_str, _encode_utf8=cache_utf8.encode, _get_cached_ascii=cache_utf8.get_cached_ascii):
Return a cached version of the string.

cElementTree will return a plain string if the XML is plain ascii. It only returns Unicode when it needs to. We want to work in utf-8 strings. So if cElementTree returns a plain string, we can just return the cached version. If it is Unicode, then we need to encode it.

Parametersa_strAn 8-bit string or Unicode as returned by cElementTree.Element.get()
ReturnsA utf-8 encoded 8-bit string.
def _clear_cache():
Clean out the unicode => escaped map
API Documentation for Bazaar, generated by pydoctor at 2022-06-16 00:25:16.