b.b.BTreeGraphIndex(object) : class documentation

Part of bzrlib.btree_index View In Hierarchy

Access to nodes via the standard GraphIndex interface for B+Tree's.

Individual nodes are held in a LRU cache. This holds the root node in memory except when very large walks are done.

Method	__init__	Create a B+Tree index object on the index name.
Method	__eq__	Equal when self and other were created with the same parameters.
Method	__ne__	Undocumented
Method	clear_cache	Clear out any cached/memoized values.
Method	external_references	Undocumented
Method	iter_all_entries	Iterate over all keys within the index.
Method	iter_entries	Iterate over keys within the index.
Method	iter_entries_prefix	Iterate over keys within the index using prefix matching.
Method	key_count	Return an estimate of the number of keys in this index.
Method	validate	Validate that everything in the index can be accessed.
Method	_get_and_cache_nodes	Read nodes and cache them in the lru.
Method	_compute_recommended_pages	Convert transport's recommended_page_size into btree pages.
Method	_compute_total_pages_in_index	How many pages are in the index.
Method	_expand_offsets	Find extra pages to download.
Method	_expand_to_neighbors	Expand requests to neighbors until we have enough pages.
Method	_find_layer_first_and_end	Find the start/stop nodes for the layer corresponding to offset.
Method	_get_offsets_to_cached_pages	Determine what nodes we already have cached.
Method	_get_root_node	Undocumented
Method	_get_nodes	Undocumented
Method	_get_internal_nodes	Get a node, from cache or disk.
Method	_cache_leaf_values	Cache directly from key => value, skipping the btree.
Method	_get_leaf_nodes	Get a bunch of nodes, from cache or disk.
Static Method	_multi_bisect_right	Find the positions where each 'in_key' would fit in fixed_keys.
Method	_walk_through_internal_nodes	Take the given set of keys, and find the corresponding LeafNodes.
Method	_find_ancestors	Find the parent_map information for the set of keys.
Method	_compute_row_offsets	Fill out the _row_offsets attribute based on _row_lengths.
Method	_parse_header_from_bytes	Parse the header from a region of bytes.
Method	_read_nodes	Read some nodes from disk into the LRU cache.
Method	_signature	The file signature for this index type.

def __init__(self, transport, name, size, unlimited_cache=False, offset=0):

Create a B+Tree index object on the index name.

Parameters	transport	The transport to read data for the index from.
	name	The file name of the index on transport.
	size	Optional size of the index in bytes. This allows compatibility with the GraphIndex API, as well as ensuring that the initial read (to read the root node header) can be done without over-reading even on empty indices, and on small indices allows single-IO to read the entire index.
	unlimited_cache	If set to True, then instead of using an LRUCache with size _NODE_CACHE_SIZE, we will use a dict and always cache all leaf nodes.
	offset	The start of the btree index data isn't byte 0 of the file. Instead it starts at some point later.

def __eq__(self, other):

Equal when self and other were created with the same parameters.

def __ne__(self, other):

Undocumented

def _get_and_cache_nodes(self, nodes):

Read nodes and cache them in the lru.

The nodes list supplied is sorted and then read from disk, each node being inserted it into the _node_cache.

Note: Asking for more nodes than the _node_cache can contain will result in some of the results being immediately discarded, to prevent this an assertion is raised if more nodes are asked for than are cachable.

Returns A dict of {node_pos: node}

def _compute_recommended_pages(self):

Convert transport's recommended_page_size into btree pages.

recommended_page_size is in bytes, we want to know how many _PAGE_SIZE pages fit in that length.

def _compute_total_pages_in_index(self):

How many pages are in the index.

If we have read the header we will use the value stored there. Otherwise it will be computed based on the length of the index.

def _expand_offsets(self, offsets):

Find extra pages to download.

The idea is that we always want to make big-enough requests (like 64kB for http), so that we don't waste round trips. So given the entries that we already have cached and the new pages being downloaded figure out what other pages we might want to read.

See also doc/developers/btree_index_prefetch.txt for more details.

Parameters	offsets	The offsets to be read
Returns	A list of offsets to download

def _expand_to_neighbors(self, offsets, cached_offsets, total_pages):

Expand requests to neighbors until we have enough pages.

This is called from _expand_offsets after policy has determined that we want to expand. We only want to expand requests within a given layer. We cheat a little bit and assume all requests will be in the same layer. This is true given the current design, but if it changes this algorithm may perform oddly.

Parameters	offsets	requested offsets
	cached_offsets	offsets for pages we currently have cached
Returns	A set() of offsets after expansion

def clear_cache(self):

Clear out any cached/memoized values.

This can be called at any time, but generally it is used when we have extracted some information, but don't expect to be requesting any more from this index.

def external_references(self, ref_list_num):

Undocumented

def _find_layer_first_and_end(self, offset):

Find the start/stop nodes for the layer corresponding to offset.

Returns (first, end) first is the first node in this layer end is the first node of the next layer

def _get_offsets_to_cached_pages(self):

Determine what nodes we already have cached.

def _get_root_node(self):

Undocumented

def _get_nodes(self, cache, node_indexes):

Undocumented

def _get_internal_nodes(self, node_indexes):

Get a node, from cache or disk.

After getting it, the node will be cached.

def _cache_leaf_values(self, nodes):

Cache directly from key => value, skipping the btree.

def _get_leaf_nodes(self, node_indexes):

Get a bunch of nodes, from cache or disk.

def iter_all_entries(self):

Iterate over all keys within the index.

Returns An iterable of (index, key, value) or (index, key, value, reference_lists). The former tuple is used when there are no reference lists in the index, making the API compatible with simple key:value index types. There is no defined order for the result iteration - it will be in the most efficient order for the index.

@staticmethod
def _multi_bisect_right(in_keys, fixed_keys):

Find the positions where each 'in_key' would fit in fixed_keys.

This is equivalent to doing "bisect_right" on each in_key into fixed_keys

Parameters	in_keys	A sorted list of keys to match with fixed_keys
	fixed_keys	A sorted list of keys to match against
Returns	A list of (integer position, [key list]) tuples.

def _walk_through_internal_nodes(self, keys):

Take the given set of keys, and find the corresponding LeafNodes.

Parameters	keys	An unsorted iterable of keys to search for
Returns	(nodes, index_and_keys) nodes is a dict mapping {index: LeafNode} keys_at_index is a list of tuples of [(index, [keys for Leaf])]

def iter_entries(self, keys):

Iterate over keys within the index.

Parameters	keys	An iterable providing the keys to be retrieved.
Returns	An iterable as per iter_all_entries, but restricted to the keys supplied. No additional keys will be returned, and every key supplied that is in the index will be returned.

def _find_ancestors(self, keys, ref_list_num, parent_map, missing_keys):

Find the parent_map information for the set of keys.

This populates the parent_map dict and missing_keys set based on the queried keys. It also can fill out an arbitrary number of parents that it finds while searching for the supplied keys.

It is unlikely that you want to call this directly. See "CombinedGraphIndex.find_ancestry()" for a more appropriate API.

Parameters	keys	A keys whose ancestry we want to return Every key will either end up in 'parent_map' or 'missing_keys'.
	ref_list_num	This index in the ref_lists is the parents we care about.
	parent_map	{key: parent_keys} for keys that are present in this index. This may contain more entries than were in 'keys', that are reachable ancestors of the keys requested.
	missing_keys	keys which are known to be missing in this index. This may include parents that were not directly requested, but we were able to determine that they are not present in this index.
Returns	search_keys parents that were found but not queried to know if they are missing or present. Callers can re-query this index for those keys, and they will be placed into parent_map or missing_keys

def iter_entries_prefix(self, keys):

Iterate over keys within the index using prefix matching.

Prefix matching is applied within the tuple of a key, not to within the bytestring of each key element. e.g. if you have the keys ('foo', 'bar'), ('foobar', 'gam') and do a prefix search for ('foo', None) then only the former key is returned.

WARNING: Note that this method currently causes a full index parse unconditionally (which is reasonably appropriate as it is a means for thunking many small indices into one larger one and still supplies iter_all_entries at the thunk layer).

Parameters	keys	An iterable providing the key prefixes to be retrieved. Each key prefix takes the form of a tuple the length of a key, but with the last N elements 'None' rather than a regular bytestring. The first element cannot be 'None'.
Returns	An iterable as per iter_all_entries, but restricted to the keys with a matching prefix to those supplied. No additional keys will be returned, and every match that is in the index will be returned.

def key_count(self):

Return an estimate of the number of keys in this index.

For BTreeGraphIndex the estimate is exact as it is contained in the header.

def _compute_row_offsets(self):

Fill out the _row_offsets attribute based on _row_lengths.

def _parse_header_from_bytes(self, bytes):

Parse the header from a region of bytes.

Parameters	bytes	The data to parse.
Returns	An offset, data tuple such as readv yields, for the unparsed data. (which may be of length 0).

def _read_nodes(self, nodes):

Read some nodes from disk into the LRU cache.

This performs a readv to get the node data into memory, and parses each node, then yields it to the caller. The nodes are requested in the supplied order. If possible doing sort() on the list before requesting a read may improve performance.

Parameters	nodes	The nodes to read. 0 - first node, 1 - second node etc.
Returns	None

def _signature(self):

The file signature for this index type.

def validate(self):

Validate that everything in the index can be accessed.

API Documentation for Bazaar, generated by pydoctor at 2022-06-16 00:25:16.