b.b.BTreeGraphIndex(object) : class documentation

Part of bzrlib.btree_index View In Hierarchy

Access to nodes via the standard GraphIndex interface for B+Tree's.

Individual nodes are held in a LRU cache. This holds the root node in memory except when very large walks are done.

Method __init__ Create a B+Tree index object on the index name.
Method __eq__ Equal when self and other were created with the same parameters.
Method __ne__ Undocumented
Method clear_cache Clear out any cached/memoized values.
Method external_references Undocumented
Method iter_all_entries Iterate over all keys within the index.
Method iter_entries Iterate over keys within the index.
Method iter_entries_prefix Iterate over keys within the index using prefix matching.
Method key_count Return an estimate of the number of keys in this index.
Method validate Validate that everything in the index can be accessed.
Method _get_and_cache_nodes Read nodes and cache them in the lru.
Method _compute_recommended_pages Convert transport's recommended_page_size into btree pages.
Method _compute_total_pages_in_index How many pages are in the index.
Method _expand_offsets Find extra pages to download.
Method _expand_to_neighbors Expand requests to neighbors until we have enough pages.
Method _find_layer_first_and_end Find the start/stop nodes for the layer corresponding to offset.
Method _get_offsets_to_cached_pages Determine what nodes we already have cached.
Method _get_root_node Undocumented
Method _get_nodes Undocumented
Method _get_internal_nodes Get a node, from cache or disk.
Method _cache_leaf_values Cache directly from key => value, skipping the btree.
Method _get_leaf_nodes Get a bunch of nodes, from cache or disk.
Static Method _multi_bisect_right Find the positions where each 'in_key' would fit in fixed_keys.
Method _walk_through_internal_nodes Take the given set of keys, and find the corresponding LeafNodes.
Method _find_ancestors Find the parent_map information for the set of keys.
Method _compute_row_offsets Fill out the _row_offsets attribute based on _row_lengths.
Method _parse_header_from_bytes Parse the header from a region of bytes.
Method _read_nodes Read some nodes from disk into the LRU cache.
Method _signature The file signature for this index type.
def __init__(self, transport, name, size, unlimited_cache=False, offset=0):
Create a B+Tree index object on the index name.
ParameterstransportThe transport to read data for the index from.
nameThe file name of the index on transport.
sizeOptional size of the index in bytes. This allows compatibility with the GraphIndex API, as well as ensuring that the initial read (to read the root node header) can be done without over-reading even on empty indices, and on small indices allows single-IO to read the entire index.
unlimited_cacheIf set to True, then instead of using an LRUCache with size _NODE_CACHE_SIZE, we will use a dict and always cache all leaf nodes.
offsetThe start of the btree index data isn't byte 0 of the file. Instead it starts at some point later.
def __eq__(self, other):
Equal when self and other were created with the same parameters.
def __ne__(self, other):
Undocumented
def _get_and_cache_nodes(self, nodes):
Read nodes and cache them in the lru.

The nodes list supplied is sorted and then read from disk, each node being inserted it into the _node_cache.

Note: Asking for more nodes than the _node_cache can contain will result in some of the results being immediately discarded, to prevent this an assertion is raised if more nodes are asked for than are cachable.

ReturnsA dict of {node_pos: node}
def _compute_recommended_pages(self):
Convert transport's recommended_page_size into btree pages.

recommended_page_size is in bytes, we want to know how many _PAGE_SIZE pages fit in that length.

def _compute_total_pages_in_index(self):
How many pages are in the index.

If we have read the header we will use the value stored there. Otherwise it will be computed based on the length of the index.

def _expand_offsets(self, offsets):
Find extra pages to download.

The idea is that we always want to make big-enough requests (like 64kB for http), so that we don't waste round trips. So given the entries that we already have cached and the new pages being downloaded figure out what other pages we might want to read.

See also doc/developers/btree_index_prefetch.txt for more details.

ParametersoffsetsThe offsets to be read
ReturnsA list of offsets to download
def _expand_to_neighbors(self, offsets, cached_offsets, total_pages):
Expand requests to neighbors until we have enough pages.

This is called from _expand_offsets after policy has determined that we want to expand. We only want to expand requests within a given layer. We cheat a little bit and assume all requests will be in the same layer. This is true given the current design, but if it changes this algorithm may perform oddly.

Parametersoffsetsrequested offsets
cached_offsetsoffsets for pages we currently have cached
ReturnsA set() of offsets after expansion
def clear_cache(self):
Clear out any cached/memoized values.

This can be called at any time, but generally it is used when we have extracted some information, but don't expect to be requesting any more from this index.

def external_references(self, ref_list_num):
Undocumented
def _find_layer_first_and_end(self, offset):
Find the start/stop nodes for the layer corresponding to offset.
Returns(first, end) first is the first node in this layer end is the first node of the next layer
def _get_offsets_to_cached_pages(self):
Determine what nodes we already have cached.
def _get_root_node(self):
Undocumented
def _get_nodes(self, cache, node_indexes):
Undocumented
def _get_internal_nodes(self, node_indexes):
Get a node, from cache or disk.

After getting it, the node will be cached.

def _cache_leaf_values(self, nodes):
Cache directly from key => value, skipping the btree.
def _get_leaf_nodes(self, node_indexes):
Get a bunch of nodes, from cache or disk.
def iter_all_entries(self):
Iterate over all keys within the index.
ReturnsAn iterable of (index, key, value) or (index, key, value, reference_lists). The former tuple is used when there are no reference lists in the index, making the API compatible with simple key:value index types. There is no defined order for the result iteration - it will be in the most efficient order for the index.
@staticmethod
def _multi_bisect_right(in_keys, fixed_keys):
Find the positions where each 'in_key' would fit in fixed_keys.

This is equivalent to doing "bisect_right" on each in_key into fixed_keys

Parametersin_keysA sorted list of keys to match with fixed_keys
fixed_keysA sorted list of keys to match against
ReturnsA list of (integer position, [key list]) tuples.
def _walk_through_internal_nodes(self, keys):
Take the given set of keys, and find the corresponding LeafNodes.
ParameterskeysAn unsorted iterable of keys to search for
Returns(nodes, index_and_keys) nodes is a dict mapping {index: LeafNode} keys_at_index is a list of tuples of [(index, [keys for Leaf])]
def iter_entries(self, keys):
Iterate over keys within the index.
ParameterskeysAn iterable providing the keys to be retrieved.
ReturnsAn iterable as per iter_all_entries, but restricted to the keys supplied. No additional keys will be returned, and every key supplied that is in the index will be returned.
def _find_ancestors(self, keys, ref_list_num, parent_map, missing_keys):
Find the parent_map information for the set of keys.

This populates the parent_map dict and missing_keys set based on the queried keys. It also can fill out an arbitrary number of parents that it finds while searching for the supplied keys.

It is unlikely that you want to call this directly. See "CombinedGraphIndex.find_ancestry()" for a more appropriate API.

ParameterskeysA keys whose ancestry we want to return Every key will either end up in 'parent_map' or 'missing_keys'.
ref_list_numThis index in the ref_lists is the parents we care about.
parent_map{key: parent_keys} for keys that are present in this index. This may contain more entries than were in 'keys', that are reachable ancestors of the keys requested.
missing_keyskeys which are known to be missing in this index. This may include parents that were not directly requested, but we were able to determine that they are not present in this index.
Returnssearch_keys parents that were found but not queried to know if they are missing or present. Callers can re-query this index for those keys, and they will be placed into parent_map or missing_keys
def iter_entries_prefix(self, keys):
Iterate over keys within the index using prefix matching.

Prefix matching is applied within the tuple of a key, not to within the bytestring of each key element. e.g. if you have the keys ('foo', 'bar'), ('foobar', 'gam') and do a prefix search for ('foo', None) then only the former key is returned.

WARNING: Note that this method currently causes a full index parse unconditionally (which is reasonably appropriate as it is a means for thunking many small indices into one larger one and still supplies iter_all_entries at the thunk layer).

ParameterskeysAn iterable providing the key prefixes to be retrieved. Each key prefix takes the form of a tuple the length of a key, but with the last N elements 'None' rather than a regular bytestring. The first element cannot be 'None'.
ReturnsAn iterable as per iter_all_entries, but restricted to the keys with a matching prefix to those supplied. No additional keys will be returned, and every match that is in the index will be returned.
def key_count(self):
Return an estimate of the number of keys in this index.

For BTreeGraphIndex the estimate is exact as it is contained in the header.

def _compute_row_offsets(self):
Fill out the _row_offsets attribute based on _row_lengths.
def _parse_header_from_bytes(self, bytes):
Parse the header from a region of bytes.
ParametersbytesThe data to parse.
ReturnsAn offset, data tuple such as readv yields, for the unparsed data. (which may be of length 0).
def _read_nodes(self, nodes):
Read some nodes from disk into the LRU cache.

This performs a readv to get the node data into memory, and parses each node, then yields it to the caller. The nodes are requested in the supplied order. If possible doing sort() on the list before requesting a read may improve performance.

ParametersnodesThe nodes to read. 0 - first node, 1 - second node etc.
ReturnsNone
def _signature(self):
The file signature for this index type.
def validate(self):
Validate that everything in the index can be accessed.
API Documentation for Bazaar, generated by pydoctor at 2022-06-16 00:25:16.