woob.browser.filters.html
¶
- class CSS(selector=None, default=_NO_DEFAULT)[source]¶
Bases:
_Selector
Select HTML elements with a CSS selector
For example:
obj_foo = CleanText(CSS('div.main'))
will take the text of all
<div>
having CSS class “main”.
- class XPath(selector=None, default=_NO_DEFAULT)[source]¶
Bases:
_Selector
Select HTML elements with a XPath selector
- exception XPathNotFound[source]¶
Bases:
ItemNotFound
- exception AttributeNotFound[source]¶
Bases:
ItemNotFound
- class Attr(selector, attr, default=_NO_DEFAULT)[source]¶
Bases:
Filter
Get the text value of an HTML attribute.
Get value from attribute attr of HTML element matched by selector.
For example:
obj_foo = Attr('//img[@id="thumbnail"]', 'src')
will take the “src” attribute of
<img>
whose “id” is “thumbnail”.- filter(el)[source]¶
- Raises:
XPathNotFound
if no element is found- Raises:
AttributeNotFound
if the element doesn’t have the requested attribute
- class Link(selector=None, default=_NO_DEFAULT)[source]¶
Bases:
Attr
Get the link uri of an element.
If the
<a>
tag is not found, an exception IndexError is raised.
- class AbsoluteLink(selector=None, default=_NO_DEFAULT)[source]¶
Bases:
Link
Get the absolute link URI of an element.
- class CleanHTML(selector=None, options=None, default=_NO_DEFAULT)[source]¶
Bases:
Filter
Convert HTML to text (Markdown) using html2text.
See also
- class FormValue(selector=None, default=_NO_DEFAULT)[source]¶
Bases:
Filter
Extract a Python value from a form element.
Checkboxes and radio return booleans, while the rest return text. For
<select>
tags, returns the user-visible text.
- class HasElement(selector, yesvalue=True, novalue=False)[source]¶
Bases:
Filter
Returns yesvalue if the selector finds elements, novalue otherwise.
- class TableCell(*names, **kwargs)[source]¶
Bases:
_Filter
Used with TableElement, gets the cell element from its name.
For example:
>>> from woob.capabilities.bank import Transaction >>> from woob.browser.elements import TableElement, ItemElement >>> class table(TableElement): ... head_xpath = '//table/thead/th' ... item_xpath = '//table/tbody/tr' ... col_date = u'Date' ... col_label = [u'Name', u'Label'] ... class item(ItemElement): ... klass = Transaction ... obj_date = Date(TableCell('date')) ... obj_label = CleanText(TableCell('label')) ...
TableCell handles table tags that have a “colspan” attribute that modify the width of the column: for example <td colspan=”2”> will occupy two columns instead of one, creating a column shift for all the next columns that must be taken in consideration when trying to match columns values with column heads.
- exception ColumnNotFound[source]¶
Bases:
FilterError