lp.translations.utilities.gettext_po_parser.POParser : API documentation

def __init__(self, plural_formula=None):

Undocumented

def _emitSyntaxWarning(self, message):

Undocumented

def _decode(self):

Undocumented

def _getHeaderLine(self):

Undocumented

def parse(self, content_text):

Parse string as a PO file.

def _storeCurrentMessage(self):

Undocumented

def _parseHeader(self, header_text, header_comment):

Undocumented

def _unescapeNumericCharSequence(self, string):

Unescape leading sequence of escaped numeric character codes.

This is for characters given in hexadecimal or octal escape notation.

Returns a tuple: first, any leading part of string as an unescaped string (empty if string did not start with a numeric escape sequence), and second, the remainder of string after the leading numeric escape sequences have been parsed.

def _parseQuotedString(self, string):

Parse a quoted string, interpreting escape sequences.

>>> parser = POParser()
>>> parser._parseQuotedString(u'\"abc\"')
u'abc'
>>> parser._parseQuotedString(u'\"abc\\ndef\"')
u'abc\ndef'
>>> parser._parseQuotedString(u'\"ab\x63\"')
u'abc'
>>> parser._parseQuotedString(u'\"ab\143\"')
u'abc'

After the string has been converted to unicode, the backslash escaped sequences are still in the encoding that the charset header specifies. Such quoted sequences will be converted to unicode by this method.

We don't know the encoding of the escaped characters and cannot be just recoded as Unicode so it's a TranslationFormatInvalidInputError >>> utf8_string = u'"view \302\253${version_title}\302\273"' >>> parser._parseQuotedString(utf8_string) Traceback (most recent call last): ... TranslationFormatInvalidInputError: Could not decode escaped string: (302253)

Now, we note the original encoding so we get the right Unicode string.

>>> class FakeHeader:
...     charset = 'UTF-8'
>>> parser._translation_file = TranslationFileData()
>>> parser._translation_file.header = FakeHeader()
>>> parser._parseQuotedString(utf8_string)
u'view \xab${version_title}\xbb'

Let's see that we raise a TranslationFormatInvalidInputError exception when we have an escaped char that is not valid in the declared encoding of the original string:

>>> iso8859_1_string = u'"foo \\xf9"'
>>> parser._parseQuotedString(iso8859_1_string)
Traceback (most recent call last):
...
TranslationFormatInvalidInputError: Could not decode escaped string as UTF-8: (\xf9)

An error will be raised if the entire string isn't contained in quotes properly:

>>> parser._parseQuotedString(u'abc')
Traceback (most recent call last):
  ...
TranslationFormatSyntaxError: String is not quoted
>>> parser._parseQuotedString(u'\"ab')
Traceback (most recent call last):
  ...
TranslationFormatSyntaxError: String not terminated
>>> parser._parseQuotedString(u'\"ab\"x')
Traceback (most recent call last):
  ...
TranslationFormatSyntaxError: Extra content found after string: (x)

def _dumpCurrentSection(self):

Dump current parsed content inside the translation message.

def _parseFreshLine(self, line, original_line):

Parse a new line (not a continuation after escaped newline).

Parameters	line	Remaining part of input line.
	original_line	Line as it originally was on input.
Returns	If there is one, the first line of a quoted string belonging to the line's section. Otherwise, None.

def _parseLine(self, original_line):

Undocumented

Method	__init__	Undocumented
Method	parse	Parse string as a PO file.
Method	_emitSyntaxWarning	Undocumented
Method	_decode	Undocumented
Method	_getHeaderLine	Undocumented
Method	_storeCurrentMessage	Undocumented
Method	_parseHeader	Undocumented
Method	_unescapeNumericCharSequence	Unescape leading sequence of escaped numeric character codes.
Method	_parseQuotedString	Parse a quoted string, interpreting escape sequences.
Method	_dumpCurrentSection	Dump current parsed content inside the translation message.
Method	_parseFreshLine	Parse a new line (not a continuation after escaped newline).
Method	_parseLine	Undocumented

l.t.u.g.POParser(object) : class documentation