l.r.s.p.w.HTTPWalker(WalkerBase) : class documentation

Part of lp.registry.scripts.productreleasefinder.walker View In Hierarchy

HTTP URL scheme walker.

This class implements a walker for the HTTP and HTTPS URL schemes. It works by assuming any URL ending with a / is a directory, and every other URL a file. URLs are tested using HEAD to see whether they cause a redirect to one ending with a /.

HTML Directory pages are parsed to find all links within them that lead to deeper URLs; this way it isn't tied to the Apache directory listing format and can actually walk arbitrary trees.

Method open Open the HTTP connection.
Method close Close the HTTP connection.
Method request Make an HTTP request.
Method isDirectory Return whether the path is a directory.
Method list Download the HTML index at subdir and scrape for URLs.

Inherited from WalkerBase:

Method __init__ Undocumented
Method walk Walk through the URL.
def open(self):
Open the HTTP connection.
def close(self):
Close the HTTP connection.
def request(self, method, path):
Make an HTTP request.

Returns the Response object.

def isDirectory(self, path):
Return whether the path is a directory.

Assumes any path ending in a slash is a directory, and any that redirects to a location ending in a slash is also a directory.

def list(self, dirname):
Download the HTML index at subdir and scrape for URLs.

Returns a list of directory names (links ending with /, or that result in redirects to themselves ending in /) and filenames (everything else) that reside underneath the path.

API Documentation for Launchpad, generated by pydoctor at 2022-06-16 00:00:12.