b.t.GzipFile(gzip.GzipFile) : class documentation

Part of bzrlib.tuned_gzip View In Hierarchy

Knit tuned version of GzipFile.

This is based on the following lsprof stats:
python 2.4 stock GzipFile write:
58971      0   5644.3090   2721.4730   gzip:193(write)
+58971     0   1159.5530   1159.5530   +<built-in method compress>
+176913    0    987.0320    987.0320   +<len>
+58971     0    423.1450    423.1450   +<zlib.crc32>
+58971     0    353.1060    353.1060   +<method 'write' of 'cStringIO.
                                        StringO' objects>
tuned GzipFile write:
58971      0   4477.2590   2103.1120   bzrlib.knit:1250(write)
+58971     0   1297.7620   1297.7620   +<built-in method compress>
+58971     0    406.2160    406.2160   +<zlib.crc32>
+58971     0    341.9020    341.9020   +<method 'write' of 'cStringIO.
                                        StringO' objects>
+58971     0    328.2670    328.2670   +<len>


Yes, its only 1.6 seconds, but they add up.
Method __init__ Undocumented
Method readline Tuned to remove buffer length calls in _unread and...
Method readlines Undocumented
Method write Undocumented
Method writelines Undocumented
Method close Undocumented
Method _add_read_data Undocumented
Method _write_gzip_header A tuned version of gzip._write_gzip_header
Method _read Undocumented
Method _read_eof tuned to reduce function calls and eliminate file seeking:
Method _read_gzip_header Supply bytes if the minimum header size is already read.
Method _unread tuned to remove unneeded len calls.
def __init__(self, *args, **kwargs):
Undocumented
def _add_read_data(self, data):
Undocumented
def _write_gzip_header(self):
A tuned version of gzip._write_gzip_header

We have some extra constrains that plain Gzip does not.
1) We want to write the whole blob at once. rather than multiple
   calls to fileobj.write().
2) We never have a filename
3) We don't care about the time
def _read(self, size=1024):
Undocumented
def _read_eof(self):
tuned to reduce function calls and eliminate file seeking: pass 1: reduces lsprof count from 800 to 288 4168 in 296 avoid U32 call by using struct format L 4168 in 200
def _read_gzip_header(self, bytes=None):
Supply bytes if the minimum header size is already read.
Parametersbytes10 bytes of header data.
def readline(self, size=-1):
Tuned to remove buffer length calls in _unread and...

also removes multiple len(c) calls, inlines _unread,
total savings - lsprof 5800 to 5300
phase 2:
4168 calls in 2233
8176 calls to read() in 1684
changing the min chunk size to 200 halved all the cache misses
leading to a drop to:
4168 calls in 1977
4168 call to read() in 1646
- i.e. just reduced the function call overhead. May be worth
  keeping.
def readlines(self, sizehint=0):
Undocumented
def _unread(self, buf, len_buf=None):
tuned to remove unneeded len calls.

because this is such an inner routine in readline, and readline is in many inner loops, this has been inlined into readline().

The len_buf parameter combined with the reduction in len calls dropped the lsprof ms count for this routine on my test data from 800 to 200 - a 75% saving.

def write(self, data):
Undocumented
def writelines(self, lines):
Undocumented
def close(self):
Undocumented
API Documentation for Bazaar, generated by pydoctor at 2022-06-16 00:25:16.