linkify_it package#
Submodules#
linkify_it.main module#
- class linkify_it.main.LinkifyIt(schemas=None, options=None)[source]#
Bases:
object
Creates new linkifier instance with optional additional schemas.
By default understands:
http(s)://...
,ftp://...
,mailto:...
&//...
links“fuzzy” links and emails (example.com, foo@bar.com).
schemas
is an dict where each key/value describes protocol/rule:key - link prefix (usually, protocol name with
:
at the end,skype:
for example). linkify-it makes shure that prefix is not preceeded with alphanumeric char. Only whitespaces and punctuation allowed.value - rule to check tail after link prefix
str - just alias to existing rule
dict
validate - either a
re.Pattern
,re str
(start with^
, and don’t include the link prefix itself), or a validatorfunction
which, given arguments self, text and pos returns the length of a match in text starting at index pos. pos is the index right after the link prefix.normalize - optional function to normalize text & url of matched result (for example, for @twitter mentions).
options
is an dict:fuzzyLink - recognige URL-s without
http(s):
prefix. DefaultTrue
.fuzzyIP - allow IPs in fuzzy links above. Can conflict with some texts like version numbers. Default
False
.fuzzyEmail - recognize emails without
mailto:
prefix.— - set True to terminate link with — (if it’s considered as long dash).
- Parameters:
schemas (dict) – Optional. Additional schemas to validate (prefix/validator)
options (dict) – { fuzzy_link | fuzzy_email | fuzzy_ip: True | False }. Default: {“fuzzy_link”: True, “fuzzy_email”: True, “fuzzy_ip”: False}.
- add(schema, definition)[source]#
Add new rule definition. (chainable)
See
linkify_it.main.LinkifyIt
init description for details.schema
is a link prefix (skype:
, for example), anddefinition
is astr
to alias to another schema, or andict
withvalidate
and optionally normalize definitions. To disable an existing rule, use.add(<schema>, None)
.- Parameters:
schema (str) – rule name (fixed pattern prefix)
definition (str or re.Pattern) – schema definition
- Returns:
- match(text)[source]#
Returns
list
of found link descriptions orNone
on fail.We strongly recommend to use
linkify_it.main.LinkifyIt.test()
first, for best speed.- Parameters:
text (str) – text to search
- Returns:
- Result match description:
schema - link schema, can be empty for fuzzy links, or
//
for protocol-neutral links.index - offset of matched text
last_index - offset of matched text
raw - offset of matched text
text - normalized text
url - link, generated from matched text
- Return type:
list
orNone
- match_at_start(text)[source]#
Returns fully-formed (not fuzzy) link if it starts at the beginning of the string, and null otherwise.
- Parameters:
text (str) – text to search
- Retuns:
Match
orNone
- normalize(match)[source]#
Default normalizer (if schema does not define it’s own).
- Parameters:
match (
linkify_it.main.Match
) – Match result
- pretest(text)[source]#
Very quick check, that can give false positives.
Returns true if link MAY BE can exists. Can be used for speed optimization, when you need to check that link NOT exists.
- Parameters:
text (str) – text to search
- Returns:
True
if a linkable pattern was found, otherwise it isFalse
.- Return type:
bool
- set(options)[source]#
Override default options. (chainable)
Missed properties will not be changed.
- Parameters:
options (dict) –
keys
: [fuzzy_link
|fuzzy_email
|fuzzy_ip
].values
: [True
|False
]- Returns:
- test(text)[source]#
Searches linkifiable pattern and returns
True
on success orFalse
on fail.- Parameters:
text (str) – text to search
- Returns:
True
if a linkable pattern was found, otherwise it isFalse
.- Return type:
bool
- test_schema_at(text, name, position)[source]#
Similar to
linkify_it.main.LinkifyIt.test()
but checks only specific protocol tail exactly at given position.- Parameters:
text (str) – text to scan
name (str) – rule (schema) name
position (int) – length of found pattern (0 on fail).
- Returns:
text (str): text to search
- Return type:
int
- tlds(list_tlds, keep_old=False)[source]#
Load (or merge) new tlds list. (chainable)
Those are user for fuzzy links (without prefix) to avoid false positives. By default this algorythm used:
hostname with any 2-letter root zones are ok.
biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф are ok.
encoded (xn–…) root zones are ok.
If list is replaced, then exact match for 2-chars root zones will be checked.
- Parameters:
list_tlds (list or str) –
list of tlds
ortlds string
keep_old (bool) – merge with current list if q`True`q (q`Falseq` by default)
- class linkify_it.main.Match(linkifyit, shift)[source]#
Bases:
object
Match result.
- schema#
Prefix (protocol) for matched string.
- Type:
str
- index#
First position of matched string.
- Type:
int
- last_index#
Next position after matched string.
- Type:
int
- raw#
Matched string.
- Type:
str
- text#
Notmalized text of matched string.
- Type:
str
- url#
Normalized url of matched string.
- Type:
str
- Parameters:
linkifyit (
linkify_it.main.LinkifyIt
) –shift (int) – text searh position
linkify_it.tlds module#
TLDS
Version 2020110600, Last Updated Fri Nov 6 07:07:02 2020 UTC
References