woob.browser.url

exception UrlNotResolvable[source]

Bases: Exception

Raised when trying to locate on an URL instance which url pattern is not resolvable as a real url.

class URL(*args, base='BASEURL', headers=None, timeout=None, methods=(), content_type=None)[source]

Bases: object

A description of an URL on the PagesBrowser website.

It takes one or several regexps to match urls, and an optional Page class which is instancied by PagesBrowser.open if the page matches a regex.

Warning

The methods parameter is only used for page matching, not request building using URL.go() or URL.open(); you must still set the method using these.

Parameters:
  • base (str) – The name of the browser’s property containing the base URL. (default: 'BASEURL')

  • headers (Dict[str, str] | None) – Headers to include on requests using this URL. (default: None)

  • timeout (float | None) – Timeout to use for this URL in particular. (default: None)

  • methods (Tuple[str, Ellipsis]) – Request HTTP methods to match the response. (default: ())

  • content_type (str | None) – MIME type of the content to match the response with. (default: None)

is_here(**kwargs)[source]

Returns True if the current page of browser matches this URL. If arguments are provided, and only then, they are checked against the arguments that were used to build the current page URL.

Return type:

bool

stay_or_go(params=None, data=None, json=None, method=None, headers=None, **kwargs)[source]

Request to go on this url only if we aren’t already here.

Arguments are optional parameters for url.

Return type:

Response | Page

>>> url = URL('https://exawple.org/(?P<pagename>).html')
>>> url.stay_or_go(pagename='index')
go(*, params=None, data=None, json=None, method=None, headers=None, timeout=None, **kwargs)[source]

Request to go on this url.

Arguments are optional parameters for url.

Return type:

Response | Page

>>> url = URL('https://exawple.org/(?P<pagename>).html')
>>> url.stay_or_go(pagename='index')
open(*, params=None, data=None, json=None, method=None, headers=None, timeout=None, is_async=False, callback=lambda response: ..., **kwargs)[source]

Request to open on this url.

Arguments are optional parameters for url.

Return type:

Response | Page

>>> url = URL('https://exawple.org/(?P<pagename>).html')
>>> url.open(pagename='index')
get_base_url(browser=None, for_pattern=None)[source]

Get the browser’s base URL for the instance.

for_pattern argument is optional and only used to display more information in the ValueError exception (don’t know why, may be removed).

Return type:

str

build(**kwargs)[source]

Build an url with the given arguments from URL’s regexps.

Parameters:

param – Query string parameters

Return type:

str

Raises:

UrlNotResolvable if unable to resolve a correct url with the given arguments.

match(url, base=None)[source]

Check if the given url match this object.

Returns None if none matches.

Return type:

Match | None

handle(response)[source]

Handle a HTTP response to get an instance of the klass if it matches.

Return type:

Page | None

id2url(func)[source]

Helper decorator to get an URL if the given first parameter is an ID.

with_headers(headers)[source]

Get the current URL with different stored headers.

For example, suppose that a browser needs to add an ‘Accept’ header for accessing a specific header of the API; see Using the Accept Header to version your API for more details.

class MyBrowser(PagesBrowser):
    products = URL('products')

class MyChildBrowser(MyBrowser):
    BASEURL = 'https://products-api.example/'

    products = MyBrowser.products.with_headers({
        'Accept': 'application/vnd.example.api+json;version=2',
    })
Parameters:

headers (Dict[str, str] | None) – The new headers to set to the URL.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL using the different headers.

without_headers()[source]

Get the current URL without stored headers.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL using the different headers.

with_timeout(timeout)[source]

Get a new URL object with timeout.

Parameters:

timeout (float | None) – The new timeout to apply, or None if the default timeout from the browser is to be used.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL using the different timeout.

without_timeout()[source]

Get a new URL object using the browser’s timeout.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL without the custom timeout.

with_page(cls)[source]

Get a new URL with the same path but a different page class.

Parameters:

cls (Type[Page]) – The new page class to use.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with the updated page class.

with_urls(*urls, clear=True, match_new_first=True)[source]

Get a new URL object with the same page but with different paths.

Parameters:
  • urls (str) – List of urls handled by the page.

  • clear (bool) – If True, the page will only handled the given urls. (default: True) Otherwise, the urls are added to already handled urls.

  • match_new_first (bool) – If true, new paths will be matched first (default: True) for this URL; this parameter is ignored when clear is True.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with the updated patterns.

with_base(base='BASEURL')[source]

Get a new URL object with a custom base.

Parameters:

base (str) – The name of the new base, or None to use the default one. (default: 'BASEURL')

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with the updated base.

with_methods(methods)[source]

Get a new URL object with custom methods.

Parameters:

methods (Tuple[str, Ellipsis]) – The new methods to match the URL with.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with the updated methods.

without_methods()[source]

Get a new URL object without matching on methods.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with the updated methods.

with_content_type(content_type)[source]

Get a new URL object with custom Content-Type matching.

Parameters:

content_type (str | None) – The new content type to match with.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with the updated content type to match.

without_content_type()[source]

Get a new URL object without Content-Type matching.

Return type:

TypeVar(URLType, bound= URL)

Returns:

The URL object with no content type matching.

class BrowserParamURL(*args, base='BASEURL', headers=None, timeout=None, methods=(), content_type=None)[source]

Bases: URL

A URL that automatically fills some params from browser attributes.

URL patterns having groups named “browser_*” will pick the relevant attribute from the browser. For example:

foo = BrowserParamURL(r’/foo?bar=(?P<browser_token>w+)’)

The browser is expected to have a .token attribute and it will be passed automatically when just calling foo.go(), it’s equivalent to foo.go(browser_token=browser.token).

Warning: all browser_* params will be passed, having multiple patterns with different groups in a BrowserParamURL is risky.

build(**kwargs)[source]

Build an url with the given arguments from URL’s regexps.

Parameters:

param – Query string parameters

Return type:

str

Raises:

UrlNotResolvable if unable to resolve a correct url with the given arguments.

normalize_url(url)[source]

Normalize URL by lower-casing the domain and other fixes.

Lower-cases the domain, removes the default port and a trailing dot.

Return type:

str

>>> normalize_url('https://EXAMPLE:80')
'https://example'