l.s.d.nl_search : module documentation

Part of lp.services.database

Helpers for doing natural language phrase search using the full text index.
Function nl_term_candidates Returns in an array the candidate search terms from phrase.
Function nl_phrase_search Return the tsearch2 query that should be used to do a phrase search.
Function _nl_phrase_search Perform a very simple pruning of the phrase, letting fti do ranking.
Function _slow_nl_phrase_search Return the tsearch2 query that should be use to do a phrase search.
def nl_term_candidates(phrase):
Returns in an array the candidate search terms from phrase. Stop words are removed from the phrase and every term is normalized according to the full text rules (lowercased and stemmed).
Unknown Field: phrasea search phrase
def nl_phrase_search(phrase, table, constraints='', extra_constraints_tables=None, fast_enabled=True):
Return the tsearch2 query that should be used to do a phrase search.

The precise heuristics applied by this function will vary as we tune the system.

It is the interface by which a user query should be turned into a backend search language query.

Caveats: The model class must define a 'fti' column which is then used used for full text searching.

ParametersphraseA search phrase.
tableThis should be the SQLBase class representing the base type.
constraintsAdditional SQL clause that limits the rows to a subset of the table.
extra_constraints_tablesA list of additional table names that are needed by the constraints clause.
fast_enabledIf true use a fast, but less precise, code path. When feature flags are available this will be converted to a feature flag.
ReturnsA tsearch2 query string.
def _nl_phrase_search(terms, table, constraints, extra_constraints_tables):
Perform a very simple pruning of the phrase, letting fti do ranking.

This function groups the terms with & clause, and creates an additional & grouping for each subset of terms created by discarding one term.

See nl_phrase_search for the contract of this function.

def _slow_nl_phrase_search(terms, table, constraints, extra_constraints_tables):
Return the tsearch2 query that should be use to do a phrase search.

This function implement an algorithm similar to the one used by MySQL
natural language search (as documented at
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html).

It eliminates stop words from the phrase and normalize each terms
according to the full text indexation rules (lowercasing and stemming).

Each term that is present in more than 50% of the candidate rows is also
eliminated from the query. That term eliminatation is only done when there
are 5 candidate rows or more.

The remaining terms are then ORed together. One should use the
ts_rank() or ts_rank_cd() function to order the results from running
that query. This will make rows that use more of the terms and for
which the terms are found closer in the text at the top of the list,
while still returning rows that use only some of the terms.

:terms: Some candidate search terms.

:table: This should be the SQLBase class representing the base type.

:constraints: Additional SQL clause that limits the rows to a
subset of the table.

:extra_constraints_tables: A list of additional table names that are
needed by the constraints clause.

Caveat: The model class must define a 'fti' column which is then used
for full text searching.
API Documentation for Launchpad, generated by pydoctor at 2022-06-16 00:00:12.