woob.tools.regex_helper

Functions for reversing a regular expression (used in reverse URL resolving). Used internally by Django and not intended for external use.

This is not, and is not intended to be, a complete reg-exp decompiler. It should be good enough for a large class of URLS, however.

class Choice(iterable=(), /)[source]

Bases: list

Represent multiple possibilities at this point in a pattern string.

class Group(iterable=(), /)[source]

Bases: list

Represent a capturing group in the pattern string.

class NonCapture(iterable=(), /)[source]

Bases: list

Represent a non-capturing group in the pattern string.

normalize(pattern)[source]

Given a reg-exp pattern, normalize it to an iterable of forms that suffice for reverse matching. This does the following:

  1. For any repeating sections, keeps the minimum number of occurrences permitted (this means zero for optional groups).

  2. If an optional group includes parameters, include one occurrence of that group (along with the zero occurrence case from step (1)).

  3. Select the first (essentially an arbitrary) element from any character class. Select an arbitrary character for any unordered class (e.g. ‘.’ or ‘w’) in the pattern.

  4. Ignore look-ahead and look-behind assertions.

  5. Raise an error on any disjunctive (‘|’) constructs.

Django’s URLs for forward resolving are either all positional arguments or all keyword arguments. That is assumed here, as well. Although reverse resolving can be done using positional args when keyword args are specified, the two cannot be mixed in the same reverse() call.

next_char(input_iter)[source]

An iterator that yields the next character from “pattern_iter”, respecting escape sequences. An escaped character is replaced by a representative of its class (e.g. w -> “x”). If the escaped character is one that is skipped, it is not returned (the next character is returned instead).

Yield the next character, along with a boolean indicating whether it is a raw (unescaped) character or not.

walk_to_end(ch, input_iter)[source]

The iterator is currently inside a capturing group. Walk to the close of this group, skipping over any nested groups and handling escaped parentheses correctly.

get_quantifier(ch, input_iter)[source]

Parse a quantifier from the input, where “ch” is the first character in the quantifier.

Return the minimum number of occurrences permitted by the quantifier and either None or the next character from the input_iter if the next character is not part of the quantifier.

contains(source, inst)[source]

Return True if the “source” contains an instance of “inst”. False, otherwise.

flatten_result(source)[source]

Turn the given source sequence into a list of reg-exp possibilities and their arguments. Return a list of strings and a list of argument lists. Each of the two lists will be of the same length.