bzrlib.xml8 : API documentation

Class	Serializer_v8	This serialiser adds rich roots.
Function	_unescaper	Undocumented
Function	_unescape_xml	Unescape predefined XML entities in a string of data.
Function	_ensure_utf8_re	Make sure the _utf8_re and _unicode_re regexes have been compiled.
Function	_unicode_escape_replace	Replace a string of non-ascii, non XML safe characters with their escape
Function	_utf8_escape_replace	Escape utf8 characters into XML safe ones.
Function	_encode_and_escape	Encode the string into utf8, and escape invalid XML characters
Function	_get_utf8_or_ascii	Return a cached version of the string.
Function	_clear_cache	Clean out the unicode => escaped map

def _unicode_escape_replace(match, _map=_xml_escape_map):

Replace a string of non-ascii, non XML safe characters with their escape

This will escape both Standard XML escapes, like <>"', etc. As well as escaping non ascii characters, because ElementTree did. This helps us remain compatible to older versions of bzr. We may change our policy in the future, though.

def _utf8_escape_replace(match, _map=_xml_escape_map):

Escape utf8 characters into XML safe ones.

This uses 2 tricks. It is either escaping "standard" characters, like "&<>, or it is handling characters with the high-bit set. For ascii characters, we just lookup the replacement in the dictionary. For everything else, we decode back into Unicode, and then use the XML escape code.

def _encode_and_escape(unicode_or_utf8_str, _map=_to_escaped_map):

Encode the string into utf8, and escape invalid XML characters

def _get_utf8_or_ascii(a_str, _encode_utf8=cache_utf8.encode, _get_cached_ascii=cache_utf8.get_cached_ascii):

Return a cached version of the string.

cElementTree will return a plain string if the XML is plain ascii. It only returns Unicode when it needs to. We want to work in utf-8 strings. So if cElementTree returns a plain string, we can just return the cached version. If it is Unicode, then we need to encode it.

Parameters	a_str	An 8-bit string or Unicode as returned by cElementTree.Element.get()
Returns	A utf-8 encoded 8-bit string.

b.xml8 : module documentation