linkify_it package#

Submodules#

linkify_it.main module#

class linkify_it.main.LinkifyIt(schemas=None, options=None)[source]#

Bases: object

Creates new linkifier instance with optional additional schemas.

By default understands:

  • http(s)://... , ftp://..., mailto:... & //... links

  • “fuzzy” links and emails (example.com, foo@bar.com).

schemas is an dict where each key/value describes protocol/rule:

  • key - link prefix (usually, protocol name with : at the end, skype: for example). linkify-it makes shure that prefix is not preceeded with alphanumeric char. Only whitespaces and punctuation allowed.

  • value - rule to check tail after link prefix

    • str - just alias to existing rule

    • dict

      • validate - either a re.Pattern, re str (start with ^, and don’t include the link prefix itself), or a validator function which, given arguments self, text and pos returns the length of a match in text starting at index pos. pos is the index right after the link prefix.

      • normalize - optional function to normalize text & url of matched result (for example, for @twitter mentions).

options is an dict:

  • fuzzyLink - recognige URL-s without http(s): prefix. Default True.

  • fuzzyIP - allow IPs in fuzzy links above. Can conflict with some texts like version numbers. Default False.

  • fuzzyEmail - recognize emails without mailto: prefix.

  • - set True to terminate link with (if it’s considered as long dash).

Parameters:
  • schemas (dict) – Optional. Additional schemas to validate (prefix/validator)

  • options (dict) – { fuzzy_link | fuzzy_email | fuzzy_ip: True | False }. Default: {“fuzzy_link”: True, “fuzzy_email”: True, “fuzzy_ip”: False}.

add(schema, definition)[source]#

Add new rule definition. (chainable)

See linkify_it.main.LinkifyIt init description for details. schema is a link prefix (skype:, for example), and definition is a str to alias to another schema, or an dict with validate and optionally normalize definitions. To disable an existing rule, use .add(<schema>, None).

Parameters:
  • schema (str) – rule name (fixed pattern prefix)

  • definition (str or re.Pattern) – schema definition

Returns:

linkify_it.main.LinkifyIt

match(text)[source]#

Returns list of found link descriptions or None on fail.

We strongly recommend to use linkify_it.main.LinkifyIt.test() first, for best speed.

Parameters:

text (str) – text to search

Returns:

Result match description:
  • schema - link schema, can be empty for fuzzy links, or // for protocol-neutral links.

  • index - offset of matched text

  • last_index - offset of matched text

  • raw - offset of matched text

  • text - normalized text

  • url - link, generated from matched text

Return type:

list or None

match_at_start(text)[source]#

Returns fully-formed (not fuzzy) link if it starts at the beginning of the string, and null otherwise.

Parameters:

text (str) – text to search

Retuns:

Match or None

normalize(match)[source]#

Default normalizer (if schema does not define it’s own).

Parameters:

match (linkify_it.main.Match) – Match result

pretest(text)[source]#

Very quick check, that can give false positives.

Returns true if link MAY BE can exists. Can be used for speed optimization, when you need to check that link NOT exists.

Parameters:

text (str) – text to search

Returns:

True if a linkable pattern was found, otherwise it is False.

Return type:

bool

set(options)[source]#

Override default options. (chainable)

Missed properties will not be changed.

Parameters:

options (dict) – keys: [fuzzy_link | fuzzy_email | fuzzy_ip]. values: [True | False]

Returns:

linkify_it.main.LinkifyIt

test(text)[source]#

Searches linkifiable pattern and returns True on success or False on fail.

Parameters:

text (str) – text to search

Returns:

True if a linkable pattern was found, otherwise it is False.

Return type:

bool

test_schema_at(text, name, position)[source]#

Similar to linkify_it.main.LinkifyIt.test() but checks only specific protocol tail exactly at given position.

Parameters:
  • text (str) – text to scan

  • name (str) – rule (schema) name

  • position (int) – length of found pattern (0 on fail).

Returns:

text (str): text to search

Return type:

int

tlds(list_tlds, keep_old=False)[source]#

Load (or merge) new tlds list. (chainable)

Those are user for fuzzy links (without prefix) to avoid false positives. By default this algorythm used:

  • hostname with any 2-letter root zones are ok.

  • biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф are ok.

  • encoded (xn–…) root zones are ok.

If list is replaced, then exact match for 2-chars root zones will be checked.

Parameters:
  • list_tlds (list or str) – list of tlds or tlds string

  • keep_old (bool) – merge with current list if q`True`q (q`Falseq` by default)

class linkify_it.main.Match(linkifyit, shift)[source]#

Bases: object

Match result.

schema#

Prefix (protocol) for matched string.

Type:

str

index#

First position of matched string.

Type:

int

last_index#

Next position after matched string.

Type:

int

raw#

Matched string.

Type:

str

text#

Notmalized text of matched string.

Type:

str

url#

Normalized url of matched string.

Type:

str

Parameters:
exception linkify_it.main.SchemaError(name, val)[source]#

Bases: Exception

Linkify schema error

linkify_it.tlds module#

TLDS

Version 2020110600, Last Updated Fri Nov 6 07:07:02 2020 UTC

References

http://data.iana.org/TLD/tlds-alpha-by-domain.txt

linkify_it.ucre module#

linkify_it.ucre.build_re(opts)[source]#

Build regex

Parameters:

opts (dict) – options

Returns:

dict of regex string

Return type:

dict