woob.browser.url

exception UrlNotResolvable[source]

Bases: Exception

Raised when trying to locate on an URL instance which url pattern is not resolvable as a real url.

class URL(*args, base='BASEURL')[source]

Bases: object

A description of an URL on the PagesBrowser website.

It takes one or several regexps to match urls, and an optional Page class which is instancied by PagesBrowser.open if the page matches a regex.

Parameters

base – The name of the browser’s property containing the base URL.

is_here(**kwargs)[source]

Returns True if the current page of browser matches this URL. If arguments are provided, and only then, they are checked against the arguments that were used to build the current page URL.

stay_or_go(params=None, data=None, json=None, method=None, headers=None, **kwargs)[source]

Request to go on this url only if we aren’t already here.

Arguments are optional parameters for url.

>>> url = URL('http://exawple.org/(?P<pagename>).html')
>>> url.stay_or_go(pagename='index')
go(*, params=None, data=None, json=None, method=None, headers=None, **kwargs)[source]

Request to go on this url.

Arguments are optional parameters for url.

>>> url = URL('http://exawple.org/(?P<pagename>).html')
>>> url.stay_or_go(pagename='index')
open(*, params=None, data=None, json=None, method=None, headers=None, is_async=False, callback=<function URL.<lambda>>, **kwargs)[source]

Request to open on this url.

Arguments are optional parameters for url.

Parameters

data – POST data

>>> url = URL('http://exawple.org/(?P<pagename>).html')
>>> url.open(pagename='index')
get_base_url(browser=None, for_pattern=None)[source]

Get the browser’s base URL for the instance.

build(**kwargs)[source]

Build an url with the given arguments from URL’s regexps.

Parameters

param – Query string parameters

Return type

str

Raises

UrlNotResolvable if unable to resolve a correct url with the given arguments.

match(url, base=None)[source]

Check if the given url match this object.

handle(response)[source]

Handle a HTTP response to get an instance of the klass if it matches.

id2url(func)[source]

Helper decorator to get an URL if the given first parameter is an ID.

with_page(cls)[source]

Get a new URL with the same path but a different page class.

Parameters

cls – The new page class to use.

with_urls(*urls, clear=True, match_new_first=True)[source]

Get a new URL object with the same page but with different paths.

Parameters
  • urls (str) – List of urls handled by the page

  • clear (bool) – If True, the page will only handled the given urls. Otherwise, the urls are added to already handled urls.

  • match_new_first (bool) – If true, new paths will be matched first for this URL; this parameter is ignored when clear is True.

class BrowserParamURL(*args, base='BASEURL')[source]

Bases: URL

A URL that automatically fills some params from browser attributes.

URL patterns having groups named “browser_*” will pick the relevant attribute from the browser. For example:

foo = BrowserParamURL(r’/foo?bar=(?P<browser_token>w+)’)

The browser is expected to have a .token attribute and it will be passed automatically when just calling foo.go(), it’s equivalent to foo.go(browser_token=browser.token).

Warning: all browser_* params will be passed, having multiple patterns with different groups in a BrowserParamURL is risky.

build(**kwargs)[source]

Build an url with the given arguments from URL’s regexps.

Parameters

param – Query string parameters

Return type

str

Raises

UrlNotResolvable if unable to resolve a correct url with the given arguments.

normalize_url(url)[source]

Normalize URL by lower-casing the domain and other fixes.

Lower-cases the domain, removes the default port and a trailing dot.

>>> normalize_url('http://EXAMPLE:80')
'http://example'