woob.browser.url
¶
- exception UrlNotResolvable[source]¶
Bases:
Exception
Raised when trying to locate on an URL instance which url pattern is not resolvable as a real url.
- class URL(*args, base='BASEURL', headers=None, timeout=None, methods=(), content_type=None)[source]¶
Bases:
object
A description of an URL on the PagesBrowser website.
It takes one or several regexps to match urls, and an optional Page class which is instancied by PagesBrowser.open if the page matches a regex.
Warning
The
methods
parameter is only used for page matching, not request building usingURL.go()
orURL.open()
; you must still set the method using these.- Parameters:
base (
str
) – The name of the browser’s property containing the base URL. (default:'BASEURL'
)headers (
Dict
[str
,str
] |None
) – Headers to include on requests using this URL. (default:None
)timeout (
float
|None
) – Timeout to use for this URL in particular. (default:None
)methods (
Tuple
[str
, Ellipsis]) – Request HTTP methods to match the response. (default:()
)content_type (
str
|None
) – MIME type of the content to match the response with. (default:None
)
- is_here(**kwargs)[source]¶
Returns True if the current page of browser matches this URL. If arguments are provided, and only then, they are checked against the arguments that were used to build the current page URL.
- Return type:
- stay_or_go(params=None, data=None, json=None, method=None, headers=None, **kwargs)[source]¶
Request to go on this url only if we aren’t already here.
Arguments are optional parameters for url.
>>> url = URL('https://exawple.org/(?P<pagename>).html') >>> url.stay_or_go(pagename='index')
- go(*, params=None, data=None, json=None, method=None, headers=None, timeout=None, **kwargs)[source]¶
Request to go on this url.
Arguments are optional parameters for url.
>>> url = URL('https://exawple.org/(?P<pagename>).html') >>> url.stay_or_go(pagename='index')
- open(*, params=None, data=None, json=None, method=None, headers=None, timeout=None, is_async=False, callback=lambda response: ..., **kwargs)[source]¶
Request to open on this url.
Arguments are optional parameters for url.
>>> url = URL('https://exawple.org/(?P<pagename>).html') >>> url.open(pagename='index')
- get_base_url(browser=None, for_pattern=None)[source]¶
Get the browser’s base URL for the instance.
for_pattern
argument is optional and only used to display more information in the ValueError exception (don’t know why, may be removed).- Return type:
- build(**kwargs)[source]¶
Build an url with the given arguments from URL’s regexps.
- Parameters:
param – Query string parameters
- Return type:
- Raises:
UrlNotResolvable
if unable to resolve a correct url with the given arguments.
- match(url, base=None)[source]¶
Check if the given url match this object.
Returns
None
if none matches.
- with_headers(headers)[source]¶
Get the current URL with different stored headers.
For example, suppose that a browser needs to add an ‘Accept’ header for accessing a specific header of the API; see Using the Accept Header to version your API for more details.
class MyBrowser(PagesBrowser): products = URL('products') class MyChildBrowser(MyBrowser): BASEURL = 'https://products-api.example/' products = MyBrowser.products.with_headers({ 'Accept': 'application/vnd.example.api+json;version=2', })
- without_headers()[source]¶
Get the current URL without stored headers.
- Return type:
TypeVar
(URLType
, bound= URL)- Returns:
The URL using the different headers.
- without_timeout()[source]¶
Get a new URL object using the browser’s timeout.
- Return type:
TypeVar
(URLType
, bound= URL)- Returns:
The URL without the custom timeout.
- with_urls(*urls, clear=True, match_new_first=True)[source]¶
Get a new URL object with the same page but with different paths.
- Parameters:
urls (
str
) – List of urls handled by the page.clear (
bool
) – If True, the page will only handled the given urls. (default:True
) Otherwise, the urls are added to already handled urls.match_new_first (
bool
) – If true, new paths will be matched first (default:True
) for this URL; this parameter is ignored whenclear
is True.
- Return type:
TypeVar
(URLType
, bound= URL)- Returns:
The URL object with the updated patterns.
- without_methods()[source]¶
Get a new URL object without matching on methods.
- Return type:
TypeVar
(URLType
, bound= URL)- Returns:
The URL object with the updated methods.
- class BrowserParamURL(*args, base='BASEURL', headers=None, timeout=None, methods=(), content_type=None)[source]¶
Bases:
URL
A URL that automatically fills some params from browser attributes.
URL patterns having groups named “browser_*” will pick the relevant attribute from the browser. For example:
foo = BrowserParamURL(r’/foo?bar=(?P<browser_token>w+)’)
The browser is expected to have a .token attribute and it will be passed automatically when just calling foo.go(), it’s equivalent to foo.go(browser_token=browser.token).
Warning: all browser_* params will be passed, having multiple patterns with different groups in a BrowserParamURL is risky.
- build(**kwargs)[source]¶
Build an url with the given arguments from URL’s regexps.
- Parameters:
param – Query string parameters
- Return type:
- Raises:
UrlNotResolvable
if unable to resolve a correct url with the given arguments.