b.k.KnitVersionedFiles(VersionedFilesWithFallbacks) : class documentation

Part of bzrlib.knit View In Hierarchy

Storage for many versioned files using knit compression.

Backend storage is managed by indices and data objects.

Instance Variables_indexA _KnitGraphIndex or similar that can describe the parents, graph, compression and data location of entries in this KnitVersionedFiles. Note that this is only the index for this vfs; if there are fallbacks they must be queried separately.
Method __init__ Create a KnitVersionedFiles with index and data_access.
Method __repr__ Undocumented
Method without_fallbacks Return a clone of this object without any fallbacks configured.
Method add_fallback_versioned_files Add a source of texts for texts not present in this knit.
Method add_lines See VersionedFiles.add_lines().
Method annotate See VersionedFiles.annotate.
Method get_annotator Undocumented
Method check See VersionedFiles.check().
Method get_parent_map Get a map of the graph parents of keys.
Method get_record_stream Get a stream of records for keys.
Method get_sha1s See VersionedFiles.get_sha1s().
Method insert_record_stream Insert a record stream into this container.
Method get_missing_compression_parent_keys Return an iterable of keys of missing compression parents.
Method iter_lines_added_or_present_in_keys Iterate over the lines in the versioned files from keys.
Method keys See VersionedFiles.keys.
Method _add_text See VersionedFiles._add_text().
Method _add Add a set of lines on top of version specified by parents.
Method _logical_check Undocumented
Method _check_add check that version_id and lines are safe to add.
Method _check_header Undocumented
Method _check_header_version Checks the header version on original format knit records.
Method _check_should_delta Iterate back through the parent listing, looking for a fulltext.
Method _build_details_to_components Convert a build_details tuple to a position tuple.
Method _get_components_positions Produce a map of position data for the components of keys.
Method _get_content Returns a content object that makes up the specified
Method _get_parent_map_with_sources Get a map of the parents of keys.
Method _get_record_map Produce a dictionary of knit records.
Method _raw_map_to_record_map Parse the contents of _get_record_map_unparsed.
Method _get_record_map_unparsed Get the raw data for reconstructing keys without parsing it.
Class Method _split_by_prefix For the given keys, split them up based on their prefix.
Method _group_keys_for_io For the given keys, group them into 'best-sized' requests.
Method _get_remaining_record_stream This function is the 'retry' portion for get_record_stream.
Method _make_line_delta Generate a line delta from delta_seq and new_content.
Method _merge_annotations Merge annotations for content and generate deltas.
Method _parse_record Parse an original format knit record.
Method _parse_record_header Parse a record header for consistency.
Method _parse_record_unchecked Undocumented
Method _read_records_iter Read text records from data file and yield result.
Method _read_records_iter_raw Read text records from data file and yield raw data.
Method _read_records_iter_unchecked Read text records from data file and yield raw data.
Method _record_to_data Convert key, digest, lines into a raw data block.
Method _split_header Undocumented

Inherited from VersionedFilesWithFallbacks:

Method get_known_graph_ancestry Get a KnownGraph instance with the ancestry of keys.

Inherited from VersionedFiles (via VersionedFilesWithFallbacks):

Method add_mpdiffs Add mpdiffs to this VersionedFile.
Static Method check_not_reserved_id Undocumented
Method clear_cache Clear whatever caches this VersionedFile holds.
Method make_mpdiffs Create multiparent diffs for specified keys.
Method _check_lines_not_unicode Check that lines being added to a versioned file are not unicode.
Method _check_lines_are_lines Check that the lines really are full lines without inline EOL.
Method _extract_blocks Undocumented
Method _transitive_fallbacks Return the whole stack of fallback versionedfiles.
def __init__(self, index, data_access, max_delta_chain=200, annotated=False, reload_func=None):
Create a KnitVersionedFiles with index and data_access.
ParametersindexThe index for the knit data.
data_accessThe access object to store and retrieve knit records.
max_delta_chainThe maximum number of deltas to permit during insertion. Set to 0 to prohibit the use of deltas.
annotatedSet to True to cause annotations to be calculated and stored during insertion.
reload_funcAn function that can be called if we think we need to reload the pack listing and try again. See 'bzrlib.repofmt.pack_repo.AggregateIndex' for the signature.
def __repr__(self):
Undocumented
def without_fallbacks(self):
Return a clone of this object without any fallbacks configured.
def add_fallback_versioned_files(self, a_versioned_files):
Add a source of texts for texts not present in this knit.
Parametersa_versioned_filesA VersionedFiles object.
def add_lines(self, key, parents, lines, parent_texts=None, left_matching_blocks=None, nostore_sha=None, random_id=False, check_content=True):
See VersionedFiles.add_lines().
def _add_text(self, key, parents, text, nostore_sha=None, random_id=False):
See VersionedFiles._add_text().
def _add(self, key, lines, parents, parent_texts, left_matching_blocks, nostore_sha, random_id, line_bytes):
Add a set of lines on top of version specified by parents.

Any versions not present will be converted into ghosts.

We pass both lines and line_bytes because different routes bring the values to this function. And for memory efficiency, we don't want to have to split/join on-demand.

ParameterslinesA list of strings where each one is a single line (has a single newline at the end of the string) This is now optional (callers can pass None). It is left in its location for backwards compatibility. It should ''.join(lines) must == line_bytes
line_bytesA single string containing the content
def annotate(self, key):
See VersionedFiles.annotate.
def get_annotator(self):
Undocumented
def check(self, progress_bar=None, keys=None):
See VersionedFiles.check().
def _logical_check(self):
Undocumented
def _check_add(self, key, lines, random_id, check_content):
check that version_id and lines are safe to add.
def _check_header(self, key, line):
Undocumented
def _check_header_version(self, rec, version_id):
Checks the header version on original format knit records.

These have the last component of the key embedded in the record.

def _check_should_delta(self, parent):
Iterate back through the parent listing, looking for a fulltext.

This is used when we want to decide whether to add a delta or a new fulltext. It searches for _max_delta_chain parents. When it finds a fulltext parent, it sees if the total size of the deltas leading up to it is large enough to indicate that we want a new full text anyway.

Return True if we should create a new delta, False if we should use a full text.

def _build_details_to_components(self, build_details):
Convert a build_details tuple to a position tuple.
def _get_components_positions(self, keys, allow_missing=False):

Produce a map of position data for the components of keys.

This data is intended to be used for retrieving the knit records.

A dict of key to (record_details, index_memo, next, parents) is returned.

  • method is the way referenced data should be applied.
  • index_memo is the handle to pass to the data access to actually get the data
  • next is the build-parent of the version, or None for fulltexts.
  • parents is the version_ids of the parents of this version
Parametersallow_missingIf True do not raise an error on a missing component, just ignore it.
def _get_content(self, key, parent_texts={}):
Returns a content object that makes up the specified version.
def get_parent_map(self, keys):
Get a map of the graph parents of keys.
ParameterskeysThe keys to look up parents for.
ReturnsA mapping from keys to parents. Absent keys are absent from the mapping.
def _get_parent_map_with_sources(self, keys):
Get a map of the parents of keys.
ParameterskeysThe keys to look up parents for.
ReturnsA tuple. The first element is a mapping from keys to parents. Absent keys are absent from the mapping. The second element is a list with the locations each key was found in. The first element is the in-this-knit parents, the second the first fallback source, and so on.
def _get_record_map(self, keys, allow_missing=False):
Produce a dictionary of knit records.
ParameterskeysThe keys to build a map for
allow_missingIf some records are missing, rather than error, just return the data that could be generated.
Returns

{key:(record, record_details, digest, next)}

  • record: data returned from read_records (a KnitContentobject)

  • record_details: opaque information to pass to parse_record

  • digest: SHA1 digest of the full text after all steps are done

  • next: build-parent of the version, i.e. the leftmost ancestor.

    Will be None if the record is not a delta.

def _raw_map_to_record_map(self, raw_map):
Parse the contents of _get_record_map_unparsed.
Returnssee _get_record_map.
def _get_record_map_unparsed(self, keys, allow_missing=False):
Get the raw data for reconstructing keys without parsing it.
ReturnsA dict suitable for parsing via _raw_map_to_record_map. key-> raw_bytes, (method, noeol), compression_parent
@classmethod
def _split_by_prefix(cls, keys):
For the given keys, split them up based on their prefix.

To keep memory pressure somewhat under control, split the requests back into per-file-id requests, otherwise "bzr co" extracts the full tree into memory before writing it to disk. This should be revisited if _get_content_maps() can ever cross file-id boundaries.

The keys for a given file_id are kept in the same relative order. Ordering between file_ids is not, though prefix_order will return the order that the key was first seen.

ParameterskeysAn iterable of key tuples
Returns(split_map, prefix_order) split_map A dictionary mapping prefix => keys prefix_order The order that we saw the various prefixes
def _group_keys_for_io(self, keys, non_local_keys, positions, _min_buffer_size=_STREAM_MIN_BUFFER_SIZE):
For the given keys, group them into 'best-sized' requests.

The idea is to avoid making 1 request per file, but to never try to unpack an entire 1.5GB source tree in a single pass. Also when possible, we should try to group requests to the same pack file together.

Returnslist of (keys, non_local) tuples that indicate what keys should be fetched next.
def get_record_stream(self, keys, ordering, include_delta_closure):
Get a stream of records for keys.
ParameterskeysThe keys to include.
orderingEither 'unordered' or 'topological'. A topologically sorted stream has compression parents strictly before their children.
include_delta_closureIf True then the closure across any compression parents will be included (in the opaque data).
ReturnsAn iterator of ContentFactory objects, each of which is only valid until the iterator is advanced.
def _get_remaining_record_stream(self, keys, ordering, include_delta_closure):
This function is the 'retry' portion for get_record_stream.
def get_sha1s(self, keys):
See VersionedFiles.get_sha1s().
def insert_record_stream(self, stream):
Insert a record stream into this container.
ParametersstreamA stream of records to insert.
ReturnsNone
See Also
def get_missing_compression_parent_keys(self):
Return an iterable of keys of missing compression parents.

Check this after calling insert_record_stream to find out if there are any missing compression parents. If there are, the records that depend on them are not able to be inserted safely. For atomic KnitVersionedFiles built on packs, the transaction should be aborted or suspended - commit will fail at this point. Nonatomic knits will error earlier because they have no staging area to put pending entries into.

def iter_lines_added_or_present_in_keys(self, keys, pb=None):

Iterate over the lines in the versioned files from keys.

This may return lines from other keys. Each item the returned iterator yields is a tuple of a line and a text version that that line is present in (not introduced in).

Ordering of results is in whatever order is most suitable for the underlying storage format.

If a progress bar is supplied, it may be used to indicate progress. The caller is responsible for cleaning up progress bars (because this is an iterator).

NOTES:
  • Lines are normalised by the underlying store: they will all have n terminators.
  • Lines are returned in arbitrary order.
  • If a requested key did not change any lines (or didn't have any lines), it may not be mentioned at all in the result.
ParameterspbProgress bar supplied by caller.
ReturnsAn iterator over (line, key).
def _make_line_delta(self, delta_seq, new_content):
Generate a line delta from delta_seq and new_content.
def _merge_annotations(self, content, parents, parent_texts={}, delta=None, annotated=None, left_matching_blocks=None):
Merge annotations for content and generate deltas.

This is done by comparing the annotations based on changes to the text and generating a delta on the resulting full texts. If annotations are not being created then a simple delta is created.

def _parse_record(self, version_id, data):
Parse an original format knit record.

These have the last element of the key only present in the stored data.

def _parse_record_header(self, key, raw_data):
Parse a record header for consistency.
Returnsthe header and the decompressor stream. as (stream, header_record)
def _parse_record_unchecked(self, data):
Undocumented
def _read_records_iter(self, records):
Read text records from data file and yield result.

The result will be returned in whatever is the fastest to read. Not by the order requested. Also, multiple requests for the same record will only yield 1 response.

ParametersrecordsA list of (key, access_memo) entries
ReturnsYields (key, contents, digest) in the order read, not the order requested
def _read_records_iter_raw(self, records):

Read text records from data file and yield raw data.

This unpacks enough of the text record to validate the id is as expected but thats all.

Each item the iterator yields is (key, bytes,
expected_sha1_of_full_text).
def _read_records_iter_unchecked(self, records):
Read text records from data file and yield raw data.

No validation is done.

Yields tuples of (key, data).

def _record_to_data(self, key, digest, lines, dense_lines=None):
Convert key, digest, lines into a raw data block.
ParameterskeyThe key of the record. Currently keys are always serialised using just the trailing component.
dense_linesThe bytes of lines but in a denser form. For instance, if lines is a list of 1000 bytestrings each ending in n, dense_lines may be a list with one line in it, containing all the 1000's lines and their n's. Using dense_lines if it is already known is a win because the string join to create bytes in this function spends less time resizing the final string.
Returns(len, a StringIO instance with the raw data ready to read.)
def _split_header(self, line):
Undocumented
def keys(self):
See VersionedFiles.keys.
API Documentation for Bazaar, generated by pydoctor at 2019-07-21 00:34:56.