linkify_it package#
Submodules#
linkify_it.main module#
- class linkify_it.main.LinkifyIt(schemas=None, options=None)[source]#
Bases:
objectCreates new linkifier instance with optional additional schemas.
By default understands:
http(s)://...,ftp://...,mailto:...&//...links“fuzzy” links and emails (example.com, foo@bar.com).
schemasis an dict where each key/value describes protocol/rule:key - link prefix (usually, protocol name with
:at the end,skype:for example). linkify-it makes shure that prefix is not preceeded with alphanumeric char. Only whitespaces and punctuation allowed.value - rule to check tail after link prefix
str - just alias to existing rule
dict
validate - either a
re.Pattern,re str(start with^, and don’t include the link prefix itself), or a validatorfunctionwhich, given arguments self, text and pos returns the length of a match in text starting at index pos. pos is the index right after the link prefix.normalize - optional function to normalize text & url of matched result (for example, for @twitter mentions).
optionsis an dict:fuzzyLink - recognige URL-s without
http(s):prefix. DefaultTrue.fuzzyIP - allow IPs in fuzzy links above. Can conflict with some texts like version numbers. Default
False.fuzzyEmail - recognize emails without
mailto:prefix.— - set True to terminate link with — (if it’s considered as long dash).
- Parameters:
schemas (dict) – Optional. Additional schemas to validate (prefix/validator)
options (dict) – { fuzzy_link | fuzzy_email | fuzzy_ip: True | False }. Default: {“fuzzy_link”: True, “fuzzy_email”: True, “fuzzy_ip”: False}.
- add(schema, definition)[source]#
Add new rule definition. (chainable)
See
linkify_it.main.LinkifyItinit description for details.schemais a link prefix (skype:, for example), anddefinitionis astrto alias to another schema, or andictwithvalidateand optionally normalize definitions. To disable an existing rule, use.add(<schema>, None).- Parameters:
schema (str) – rule name (fixed pattern prefix)
definition (str or re.Pattern) – schema definition
- Returns:
- match(text)[source]#
Returns
listof found link descriptions orNoneon fail.We strongly recommend to use
linkify_it.main.LinkifyIt.test()first, for best speed.- Parameters:
text (str) – text to search
- Returns:
- Result match description:
schema - link schema, can be empty for fuzzy links, or
//for protocol-neutral links.index - offset of matched text
last_index - offset of matched text
raw - offset of matched text
text - normalized text
url - link, generated from matched text
- Return type:
listorNone
- match_at_start(text)[source]#
Returns fully-formed (not fuzzy) link if it starts at the beginning of the string, and null otherwise.
- Parameters:
text (str) – text to search
- Retuns:
MatchorNone
- normalize(match)[source]#
Default normalizer (if schema does not define it’s own).
- Parameters:
match (
linkify_it.main.Match) – Match result
- pretest(text)[source]#
Very quick check, that can give false positives.
Returns true if link MAY BE can exists. Can be used for speed optimization, when you need to check that link NOT exists.
- Parameters:
text (str) – text to search
- Returns:
Trueif a linkable pattern was found, otherwise it isFalse.- Return type:
bool
- set(options)[source]#
Override default options. (chainable)
Missed properties will not be changed.
- Parameters:
options (dict) –
keys: [fuzzy_link|fuzzy_email|fuzzy_ip].values: [True|False]- Returns:
- test(text)[source]#
Searches linkifiable pattern and returns
Trueon success orFalseon fail.- Parameters:
text (str) – text to search
- Returns:
Trueif a linkable pattern was found, otherwise it isFalse.- Return type:
bool
- test_schema_at(text, name, position)[source]#
Similar to
linkify_it.main.LinkifyIt.test()but checks only specific protocol tail exactly at given position.- Parameters:
text (str) – text to scan
name (str) – rule (schema) name
position (int) – length of found pattern (0 on fail).
- Returns:
text (str): text to search
- Return type:
int
- tlds(list_tlds, keep_old=False)[source]#
Load (or merge) new tlds list. (chainable)
Those are user for fuzzy links (without prefix) to avoid false positives. By default this algorythm used:
hostname with any 2-letter root zones are ok.
biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф are ok.
encoded (xn–…) root zones are ok.
If list is replaced, then exact match for 2-chars root zones will be checked.
- Parameters:
list_tlds (list or str) –
list of tldsortlds stringkeep_old (bool) – merge with current list if q`True`q (q`Falseq` by default)
- class linkify_it.main.Match(linkifyit, shift)[source]#
Bases:
objectMatch result.
- schema#
Prefix (protocol) for matched string.
- Type:
str
- index#
First position of matched string.
- Type:
int
- last_index#
Next position after matched string.
- Type:
int
- raw#
Matched string.
- Type:
str
- text#
Notmalized text of matched string.
- Type:
str
- url#
Normalized url of matched string.
- Type:
str
- Parameters:
linkifyit (
linkify_it.main.LinkifyIt)shift (int) – text searh position
linkify_it.tlds module#
TLDS
Version 2020110600, Last Updated Fri Nov 6 07:07:02 2020 UTC
References