Part of bzrlib.btree_index View In Hierarchy
Individual nodes are held in a LRU cache. This holds the root node in memory except when very large walks are done.
Method | __init__ | Create a B+Tree index object on the index name. |
Method | __eq__ | Equal when self and other were created with the same parameters. |
Method | __ne__ | Undocumented |
Method | clear_cache | Clear out any cached/memoized values. |
Method | external_references | Undocumented |
Method | iter_all_entries | Iterate over all keys within the index. |
Method | iter_entries | Iterate over keys within the index. |
Method | iter_entries_prefix | Iterate over keys within the index using prefix matching. |
Method | key_count | Return an estimate of the number of keys in this index. |
Method | validate | Validate that everything in the index can be accessed. |
Method | _get_and_cache_nodes | Read nodes and cache them in the lru. |
Method | _compute_recommended_pages | Convert transport's recommended_page_size into btree pages. |
Method | _compute_total_pages_in_index | How many pages are in the index. |
Method | _expand_offsets | Find extra pages to download. |
Method | _expand_to_neighbors | Expand requests to neighbors until we have enough pages. |
Method | _find_layer_first_and_end | Find the start/stop nodes for the layer corresponding to offset. |
Method | _get_offsets_to_cached_pages | Determine what nodes we already have cached. |
Method | _get_root_node | Undocumented |
Method | _get_nodes | Undocumented |
Method | _get_internal_nodes | Get a node, from cache or disk. |
Method | _cache_leaf_values | Cache directly from key => value, skipping the btree. |
Method | _get_leaf_nodes | Get a bunch of nodes, from cache or disk. |
Static Method | _multi_bisect_right | Find the positions where each 'in_key' would fit in fixed_keys. |
Method | _walk_through_internal_nodes | Take the given set of keys, and find the corresponding LeafNodes. |
Method | _find_ancestors | Find the parent_map information for the set of keys. |
Method | _compute_row_offsets | Fill out the _row_offsets attribute based on _row_lengths. |
Method | _parse_header_from_bytes | Parse the header from a region of bytes. |
Method | _read_nodes | Read some nodes from disk into the LRU cache. |
Method | _signature | The file signature for this index type. |
Parameters | transport | The transport to read data for the index from. |
name | The file name of the index on transport. | |
size | Optional size of the index in bytes. This allows compatibility with the GraphIndex API, as well as ensuring that the initial read (to read the root node header) can be done without over-reading even on empty indices, and on small indices allows single-IO to read the entire index. | |
unlimited_cache | If set to True, then instead of using an LRUCache with size _NODE_CACHE_SIZE, we will use a dict and always cache all leaf nodes. | |
offset | The start of the btree index data isn't byte 0 of the file. Instead it starts at some point later. |
The nodes list supplied is sorted and then read from disk, each node being inserted it into the _node_cache.
Note: Asking for more nodes than the _node_cache can contain will result in some of the results being immediately discarded, to prevent this an assertion is raised if more nodes are asked for than are cachable.
Returns | A dict of {node_pos: node} |
recommended_page_size is in bytes, we want to know how many _PAGE_SIZE pages fit in that length.
If we have read the header we will use the value stored there. Otherwise it will be computed based on the length of the index.
The idea is that we always want to make big-enough requests (like 64kB for http), so that we don't waste round trips. So given the entries that we already have cached and the new pages being downloaded figure out what other pages we might want to read.
See also doc/developers/btree_index_prefetch.txt for more details.
Parameters | offsets | The offsets to be read |
Returns | A list of offsets to download |
This is called from _expand_offsets after policy has determined that we want to expand. We only want to expand requests within a given layer. We cheat a little bit and assume all requests will be in the same layer. This is true given the current design, but if it changes this algorithm may perform oddly.
Parameters | offsets | requested offsets |
cached_offsets | offsets for pages we currently have cached | |
Returns | A set() of offsets after expansion |
This can be called at any time, but generally it is used when we have extracted some information, but don't expect to be requesting any more from this index.
Returns | (first, end) first is the first node in this layer end is the first node of the next layer |
After getting it, the node will be cached.
Returns | An iterable of (index, key, value) or (index, key, value, reference_lists). The former tuple is used when there are no reference lists in the index, making the API compatible with simple key:value index types. There is no defined order for the result iteration - it will be in the most efficient order for the index. |
This is equivalent to doing "bisect_right" on each in_key into fixed_keys
Parameters | in_keys | A sorted list of keys to match with fixed_keys |
fixed_keys | A sorted list of keys to match against | |
Returns | A list of (integer position, [key list]) tuples. |
Parameters | keys | An unsorted iterable of keys to search for |
Returns | (nodes, index_and_keys) nodes is a dict mapping {index: LeafNode} keys_at_index is a list of tuples of [(index, [keys for Leaf])] |
Parameters | keys | An iterable providing the keys to be retrieved. |
Returns | An iterable as per iter_all_entries, but restricted to the keys supplied. No additional keys will be returned, and every key supplied that is in the index will be returned. |
This populates the parent_map dict and missing_keys set based on the queried keys. It also can fill out an arbitrary number of parents that it finds while searching for the supplied keys.
It is unlikely that you want to call this directly. See "CombinedGraphIndex.find_ancestry()" for a more appropriate API.
Parameters | keys | A keys whose ancestry we want to return Every key will either end up in 'parent_map' or 'missing_keys'. |
ref_list_num | This index in the ref_lists is the parents we care about. | |
parent_map | {key: parent_keys} for keys that are present in this index. This may contain more entries than were in 'keys', that are reachable ancestors of the keys requested. | |
missing_keys | keys which are known to be missing in this index. This may include parents that were not directly requested, but we were able to determine that they are not present in this index. | |
Returns | search_keys parents that were found but not queried to know if they are missing or present. Callers can re-query this index for those keys, and they will be placed into parent_map or missing_keys |
Prefix matching is applied within the tuple of a key, not to within the bytestring of each key element. e.g. if you have the keys ('foo', 'bar'), ('foobar', 'gam') and do a prefix search for ('foo', None) then only the former key is returned.
WARNING: Note that this method currently causes a full index parse unconditionally (which is reasonably appropriate as it is a means for thunking many small indices into one larger one and still supplies iter_all_entries at the thunk layer).
Parameters | keys | An iterable providing the key prefixes to be retrieved. Each key prefix takes the form of a tuple the length of a key, but with the last N elements 'None' rather than a regular bytestring. The first element cannot be 'None'. |
Returns | An iterable as per iter_all_entries, but restricted to the keys with a matching prefix to those supplied. No additional keys will be returned, and every match that is in the index will be returned. |
For BTreeGraphIndex the estimate is exact as it is contained in the header.
Parameters | bytes | The data to parse. |
Returns | An offset, data tuple such as readv yields, for the unparsed data. (which may be of length 0). |
This performs a readv to get the node data into memory, and parses each node, then yields it to the caller. The nodes are requested in the supplied order. If possible doing sort() on the list before requesting a read may improve performance.
Parameters | nodes | The nodes to read. 0 - first node, 1 - second node etc. |
Returns | None |