Part of lp.translations.utilities.gettext_po_parser View In Hierarchy
Method | __init__ | Undocumented |
Method | parse | Parse string as a PO file. |
Method | _emitSyntaxWarning | Undocumented |
Method | _decode | Undocumented |
Method | _getHeaderLine | Undocumented |
Method | _storeCurrentMessage | Undocumented |
Method | _parseHeader | Undocumented |
Method | _unescapeNumericCharSequence | Unescape leading sequence of escaped numeric character codes. |
Method | _parseQuotedString | Parse a quoted string, interpreting escape sequences. |
Method | _dumpCurrentSection | Dump current parsed content inside the translation message. |
Method | _parseFreshLine | Parse a new line (not a continuation after escaped newline). |
Method | _parseLine | Undocumented |
This is for characters given in hexadecimal or octal escape notation.
Returns | a tuple: first, any leading part of string as an unescaped
string (empty if string did not start with a numeric escape
sequence), and second, the remainder of string after the leading
numeric escape sequences have been parsed. |
Parse a quoted string, interpreting escape sequences.
>>> parser = POParser() >>> parser._parseQuotedString(u'\"abc\"') u'abc' >>> parser._parseQuotedString(u'\"abc\\ndef\"') u'abc\ndef' >>> parser._parseQuotedString(u'\"ab\x63\"') u'abc' >>> parser._parseQuotedString(u'\"ab\143\"') u'abc'
After the string has been converted to unicode, the backslash escaped sequences are still in the encoding that the charset header specifies. Such quoted sequences will be converted to unicode by this method.
We don't know the encoding of the escaped characters and cannot be just recoded as Unicode so it's a TranslationFormatInvalidInputError >>> utf8_string = u'"view \302\253${version_title}\302\273"' >>> parser._parseQuotedString(utf8_string) Traceback (most recent call last): ... TranslationFormatInvalidInputError: Could not decode escaped string: (302253)
Now, we note the original encoding so we get the right Unicode string.
>>> class FakeHeader: ... charset = 'UTF-8' >>> parser._translation_file = TranslationFileData() >>> parser._translation_file.header = FakeHeader() >>> parser._parseQuotedString(utf8_string) u'view \xab${version_title}\xbb'
Let's see that we raise a TranslationFormatInvalidInputError exception when we have an escaped char that is not valid in the declared encoding of the original string:
>>> iso8859_1_string = u'"foo \\xf9"' >>> parser._parseQuotedString(iso8859_1_string) Traceback (most recent call last): ... TranslationFormatInvalidInputError: Could not decode escaped string as UTF-8: (\xf9)
An error will be raised if the entire string isn't contained in quotes properly:
>>> parser._parseQuotedString(u'abc') Traceback (most recent call last): ... TranslationFormatSyntaxError: String is not quoted >>> parser._parseQuotedString(u'\"ab') Traceback (most recent call last): ... TranslationFormatSyntaxError: String not terminated >>> parser._parseQuotedString(u'\"ab\"x') Traceback (most recent call last): ... TranslationFormatSyntaxError: Extra content found after string: (x)
Parameters | line | Remaining part of input line. |
original_line | Line as it originally was on input. | |
Returns | If there is one, the first line of a quoted string belonging to the line's section. Otherwise, None. |