Cyclomatic complexity is too high in method encode. (6) Open
def encode(self, text: str) -> str:
r"""Perform encoding of run-length-encoding (RLE).
Parameters
----------
- Read upRead up
- Exclude checks
Cyclomatic Complexity
Cyclomatic Complexity corresponds to the number of decisions a block of code contains plus 1. This number (also called McCabe number) is equal to the number of linearly independent paths through the code. This number can be used as a guide when testing conditional logic in blocks.
Radon analyzes the AST tree of a Python program to compute Cyclomatic Complexity. Statements have the following effects on Cyclomatic Complexity:
Construct | Effect on CC | Reasoning |
---|---|---|
if | +1 | An if statement is a single decision. |
elif | +1 | The elif statement adds another decision. |
else | +0 | The else statement does not cause a new decision. The decision is at the if. |
for | +1 | There is a decision at the start of the loop. |
while | +1 | There is a decision at the while statement. |
except | +1 | Each except branch adds a new conditional path of execution. |
finally | +0 | The finally block is unconditionally executed. |
with | +1 | The with statement roughly corresponds to a try/except block (see PEP 343 for details). |
assert | +1 | The assert statement internally roughly equals a conditional statement. |
Comprehension | +1 | A list/set/dict comprehension of generator expression is equivalent to a for loop. |
Boolean Operator | +1 | Every boolean operator (and, or) adds a decision point. |
Function decode
has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring. Open
def decode(self, text: str) -> str:
r"""Perform decoding of run-length-encoding (RLE).
Parameters
----------
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function encode
has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring. Open
def encode(self, text: str) -> str:
r"""Perform encoding of run-length-encoding (RLE).
Parameters
----------
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Cyclic import (abydos.distance -> abydos.distance._rouge_l) Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Used when a cyclic import between two or more modules is detected.
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication.
==abydos.distance.sift4:59
==abydos.distance.sift4simplest:54
def distabs(self, src: str, tar: str) -> float:
``Return the
common` Sift4 distance between two terms.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
int The Sift4 distance according to the common formula
Examples
cmp = Sift4() cmp.distabs('cat', 'hat') 1 cmp.distabs('Niall', 'Neil') 2 cmp.distabs('Colin', 'Cuilen') 3 cmp.distabs('ATCG', 'TAGC') 2
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class
if not src:
return len(tar)
if not tar:
return len(src)
src_len = len(src)
tar_len = len(tar)
src_cur = 0
tar_cur = 0
lcss = 0
local_cs = 0
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:59 ==abydos.stemmer.snowballnorwegian:56 ==abydos.stemmer.snowballswedish:56 }
def stem(self, word: str) -> str: ```Return Snowball Swedish stem.
Parameters
word : str The word to stem
Returns
str Word stem
Examples
stmr = SnowballSwedish() stmr.stem('undervisa') 'undervis' stmr.stem('suspension') 'suspension' stmr.stem('visshet') 'viss'
.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class
# lowercase, normalize, and compose
word = normalize('NFC', word.lower())
r1_start = min(max(3, self._sb_r1(word)), len(word))
# Step 1
_r1 = word[r1_start:]
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:163 ==abydos.distance.phoneticeditdistance:179 else 0 ), # sub/== ) dmat[i + 1, j + 1] = min(opts) if backtrace: tracemat[i + 1, j + 1] = int(np.argmin(opts))
if self._mode == 'osa': if ( i + 1 > 1 and j + 1 > 1
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:61 ==abydos.phonetic.phonix:190 self.zeropad = zero_pad
def encode_alpha(self, word: str) -> str: ```Return the alphabetic LEIN code for a word.
Parameters
word : str The word to transform
Returns
str The alphabetic LEIN code
Examples
pe = LEIN() pe.encodealpha('Christopher') 'CLKT' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SKNT'
.. versionadded:: 0.4.0
code = self.encode(word).rstrip('0')
return code[:1] + code[1:].translate(self._alphabetic)
def encode(self, word: str) -> str:
```Return the LEIN code for a word.
Parameters
----------
word : str
The word to transform
Returns
-------
str
The LEIN code
Examples
--------
>>> pe = LEIN()
>>> pe.encode('Christopher')
'C351'
>>> pe.encode('Niall')
'N300'
>>> pe.encode('Smith')
'S210'
>>> pe.encode('Schmidt')
'S521'
.. versionadded:: 0.3.0
.. versionchanged:: 0.3.6
Encapsulated in class
# uppercase, normalize, decompose, and filter non-A-Z out
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:193 ==abydos.phonetic.soundex:231 ==abydos.phonetic.soundex_br:155 sdx = sdx.replace('0', '') # rule 1
if self.zeropad: sdx += '0' * self.maxlength # rule 4
return sdx[: self.maxlength]
if name == 'main': import doctest
doctest.testmod()
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:76 ==abydos.tokenizer.cvcluster:77 ==abydos.tokenizer.vccluster:77 if consonants: self.consonants = consonants else: self.consonants = set('bcdfghjklmnpqrstvwxzßBCDFGHJKLMNPQRSTVWXZ') if vowels: self.vowels = vowels else: self.vowels = set('aeiouyAEIOUY') self._regexp = re.compile(r'w+|[^ws]+', flags=0)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:323 ==abydos.distance.phoneticeditdistance:253 dmat = cast( np.ndarray, self.alignment_matrix(src, tar, backtrace=False) )
if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])
def dist(self, src: str, tar: str) -> float: ```Return the normalized phonetic edit distance between two strings.
The edit distance is normalized by dividing the edit distance
(calculated by either of the two supported methods) by the
greater of the number of characters in src times the cost of a delete
and the number of characters in tar times the cost of an insert.
For the case in which all operations have :math:cost = 1
, this is
equivalent to the greater of the length of the two strings src & tar.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float The normalized Levenshtein distance between src & tar
Examples
cmp = PhoneticEditDistance() round(cmp.dist('cat', 'hat'), 12) 0.059139784946 round(cmp.dist('Niall', 'Neil'), 12) 0.232258064516 cmp.dist('aluminum', 'Catalan') 0.3084677419354839 cmp.dist('ATCG', 'TAGC') 0.2983870967741935
.. versionadded:: 0.4.1
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.gotoh:147 ==abydos.distance.needlemanwunsch:204 ==abydos.distance.smith_waterman:115 def sim(self, src: str, tar: str) -> float: ```Return the normalized Needleman-Wunsch score of two strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Normalized Needleman-Wunsch score
Examples
cmp = NeedlemanWunsch() cmp.sim('cat', 'hat') 0.6666666666666667 cmp.sim('Niall', 'Neil') 0.22360679774997896 round(cmp.sim('aluminum', 'Catalan'), 12) 0.0 cmp.sim('cat', 'hat') 0.6666666666666667
.. versionadded:: 0.4.1
if src == tar:
return 1.0
return max(0.0, self.sim_score(src, tar)) / (
self.sim_score(src, src) ** 0.5 * self.sim_score(tar, tar) ** 0.5
)
if __name__ == '__main__':
import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:56 ==abydos.fingerprint.occurrence:55 self.nbits = nbits self.mostcommon = mostcommon
def fingerprint(self, word: str) -> str: ```Return the occurrence fingerprint.
Parameters
word : str The word to fingerprint
Returns
str The occurrence fingerprint
Examples
of = Occurrence() of.fingerprint('hat') '0110000100000000' of.fingerprint('niall') '0010110000100000' of.fingerprint('colin') '0001110000110000' of.fingerprint('atcg') '0110000000010000' of.fingerprint('entreatment') '1110010010000100'
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class .. versionchanged:: 0.6.0 Changed to return a str and added fingerprint_int method
return ('{:0' + str(self._n_bits) + 'b}').format(
self.fingerprint_int(word)
)
def fingerprint_int(self, word: str) -> int:
```Return the occurrence fingerprint.
Parameters
----------
word : str
The word to fingerprint
Returns
-------
int
The occurrence fingerprint as an int
Examples
--------
>>> of = Occurrence()
>>> of.fingerprint_int('hat')
24832
>>> of.fingerprint_int('niall')
11296
>>> of.fingerprint_int('colin')
7216
>>> of.fingerprint_int('atcg')
24592
>>> of.fingerprint_int('entreatment')
58500
.. versionadded:: 0.6.0
nbits = self.n_bits
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.pshpsoundexfirst:41 ==abydos.phonetic.pshpsoundexlast:41 trans = dict( zip( (ord() for _ in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), '01230120022455012523010202', ) )
alphabetic = dict(zip((ord() for _ in '12345'), 'PKTLN'))
def init(self, max_length: int = 4, german: bool = False) -> None: ```Initialize PSHPSoundexFirst instance.
Parameters
max_length : int The length of the code returned (defaults to 4) german : bool Set to True if the name is German (different rules apply)
.. versionadded:: 0.4.0
self._max_length = max_length
self._german = german
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.haase:258 ==abydos.phonetic.koelner:165 ): sdx += '4' else: sdx += '8' elif _before(word, i, {'A', 'H', 'K', 'O', 'Q', 'U', 'X'}): sdx += '4' else: sdx += '8' elif word[i] == 'X': if _after(word, i, {'C', 'K', 'Q'}): sdx += '8' else: sdx += '48' elif word[i] == 'L': sdx += '5' elif word[i] in {'M', 'N'}: sdx += '6' elif word[i] == 'R': sdx += '7' elif word[i] in {'S', 'Z'}: sdx += '8'
sdx = self.deleteconsecutive_repeats(sdx)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.legalipy:136 ==abydos.tokenizer.sonoripy:101 if not self.orderedtokens: self.orderedtokens = [self._string]
self.scaleand_counterize() return self
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:48 ==abydos.stemmer.snowballnorwegian:46 'l', 'm', 'n', 'o', 'p', 'r', 't', 'v', 'y', 'z',
Method could be a function Open
def encode(self, text: str) -> str:
- Read upRead up
- Exclude checks
Used when a method doesn't use its bound instance, and so could be written as a function.
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:59 ==abydos.fingerprint.occurrence:58 ==abydos.fingerprint._position:60 def fingerprint(self, word: str) -> str: ```Return the position fingerprint.
Parameters
word : str The word to fingerprint
Returns
str The position fingerprint
Examples
pf = Position() pf.fingerprint('hat') '1110100011111111' pf.fingerprint('niall') '1111110101110010' pf.fingerprint('colin') '1111111110010111' pf.fingerprint('atcg') '1110010001111111' pf.fingerprint('entreatment') '0000101011111111'
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class .. versionchanged:: 0.6.0 Changed to return a str and added fingerprint_int method
return ('{:0' + str(self._n_bits) + 'b}').format(
self.fingerprint_int(word)
)
def fingerprint_int(self, word: str) -> int:
```Return the position fingerprint.
Parameters
----------
word : str
The word to fingerprint
Returns
-------
int
The position fingerprint as an int
Examples
--------
>>> pf = Position()
>>> pf.fingerprint_int('hat')
59647
>>> pf.fingerprint_int('niall')
64882
>>> pf.fingerprint_int('colin')
65431
>>> pf.fingerprint_int('atcg')
58495
>>> pf.fingerprint_int('entreatment')
2815
.. versionadded:: 0.6.0
nbits = self.n_bits
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.editex:230 ==abydos.distance.levenshtein:325 )
if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])
def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.
The Levenshtein distance is normalized by dividing the Levenshtein
distance (calculated by either of the two supported methods) by the
greater of the number of characters in src times the cost of a delete
and the number of characters in tar times the cost of an insert.
For the case in which all operations have :math:cost = 1
, this is
equivalent to the greater of the length of the two strings src & tar.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float The normalized Levenshtein distance between src & tar
Examples
cmp = Levenshtein() round(cmp.dist('cat', 'hat'), 12) 0.333333333333 round(cmp.dist('Niall', 'Neil'), 12) 0.6 cmp.dist('aluminum', 'Catalan') 0.875 cmp.dist('ATCG', 'TAGC') 0.75
.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class
if src == tar:
return 0.0
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:287 ==abydos.distance.editex:230 ==abydos.distance.phoneticeditdistance:255 )
if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])
def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.
The Levenshtein distance is normalized by dividing the Levenshtein
distance (calculated by any of the three supported methods) by the
greater of the number of characters in src times the cost of a delete
and the number of characters in tar times the cost of an insert.
For the case in which all operations have :math:cost = 1
, this is
equivalent to the greater of the length of the two strings src & tar.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float The normalized Levenshtein distance between src & tar
Examples
cmp = DiscountedLevenshtein() cmp.dist('cat', 'hat') 0.3513958291799864 cmp.dist('Niall', 'Neil') 0.5909885886270658 cmp.dist('aluminum', 'Catalan') 0.8348163322045603 cmp.dist('ATCG', 'TAGC') 0.7217609721523955
.. versionadded:: 0.4.1
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:202 ==abydos.distance.smithwaterman:113 return cast(float, dmat[dmat.shape[0] - 1, d_mat.shape[1] - 1])
def sim(self, src: str, tar: str) -> float: ```Return the normalized Needleman-Wunsch score of two strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Normalized Needleman-Wunsch score
Examples
cmp = NeedlemanWunsch() cmp.sim('cat', 'hat') 0.6666666666666667 cmp.sim('Niall', 'Neil') 0.22360679774997896 round(cmp.sim('aluminum', 'Catalan'), 12) 0.0 cmp.sim('cat', 'hat') 0.6666666666666667
.. versionadded:: 0.4.1
if src == tar:
return 1.0
return max(0.0, self.sim_score(src, tar)) / (
self.sim_score(src, src) ** 0.5 * self.sim_score(tar, tar) ** 0.5
)
if __name__ == '__main__':
import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:111 ==abydos.distance.phoneticeditdistance:118 def alignmentmatrix( self, src: str, tar: str, backtrace: bool = True ) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]: ```Return the Levenshtein alignment matrix.
Parameters
src : str Source string for comparison tar : str Target string for comparison backtrace : bool Return the backtrace matrix as well
Returns
numpy.ndarray or tuple(numpy.ndarray, numpy.ndarray) The alignment matrix and (optionally) the backtrace matrix
.. versionadded:: 0.4.1
ins_cost, del_cost, sub_cost, trans_cost = self._cost
src_len = len(src)
tar_len = len(tar)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:130 ==abydos.tokenizer.vccluster:127 mode = 1 elif char in self.vowels: if mode == 1: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 2 else: # This should cover combining marks, marks, etc. new_token += char
self.orderedtokens.append(new_token)
self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self
if name == 'main': import doctest
doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.shapirastoreri:160 ==abydos.distance.typo:324 for i in range(len(src) + 1): dmat[i, 0] = i * delcost for j in range(len(tar) + 1): dmat[0, j] = j * inscost
for i in range(len(src)): for j in range(len(tar)): dmat[i + 1, j + 1] = min( dmat[i + 1, j] + inscost, # ins dmat[i, j + 1] + delcost, # del dmat[i, j]
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:381 ==abydos.stemmer.porter:395 if word[i] == 'Y': word = word[:i] + 'y' + word[i + 1 :]
return word
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:62 ==abydos.phonetic.phonex:54 if maxlength != -1: self.maxlength = min(max(4, maxlength), 64) else: self.maxlength = 64 self.zeropad = zeropad
def encode_alpha(self, word: str) -> str: ```Return the alphabetic Fuzzy Soundex code for a word.
Parameters
word : str The word to transform
Returns
str The alphabetic Fuzzy Soundex value
Examples
pe = FuzzySoundex() pe.encodealpha('Christopher') 'KRSTP' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SNT'
.. versionadded:: 0.4.0
code = self.encode(word).rstrip('0')
return code[:1] + code[1:].translate(self._alphabetic)
def encode(self, word: str) -> str:
```Return the Fuzzy Soundex code for a word.
Parameters
----------
word : str
The word to transform
Returns
-------
str
The Fuzzy Soundex value
Examples
--------
>>> pe = FuzzySoundex()
>>> pe.encode('Christopher')
'K6931'
>>> pe.encode('Niall')
'N4000'
>>> pe.encode('Smith')
'S5300'
>>> pe.encode('Smith')
'S5300'
.. versionadded:: 0.1.0
.. versionchanged:: 0.3.6
Encapsulated in class
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.chebyshev:21 ==abydos.distance.unknown_f:22 from typing import ( Any, Counter as TCounter, NoReturn, Optional, Sequence, Set, Union, )
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:354 ==abydos.distance.phoneticedit_distance:309 )
return self.distabs(src, tar) / normalizeterm
if name == 'main': import doctest
doctest.testmod()
Similar lines in 4 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:66 ==abydos.phonetic.lein:61 ==abydos.phonetic.phonex:58 ==abydos.phonetic.phonix:190 self.zeropad = zeropad
def encode_alpha(self, word: str) -> str: ```Return the alphabetic Phonex code for a word.
Parameters
word : str The word to transform
Returns
str The alphabetic Phonex value
Examples
pe = Phonex() pe.encodealpha('Christopher') 'CRST' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SSNT'
.. versionadded:: 0.4.0
code = self.encode(word).rstrip('0')
return code[:1] + code[1:].translate(self._alphabetic)
def encode(self, word: str) -> str:
```Return the Phonex code for a word.
Parameters
----------
word : str
The word to transform
Returns
-------
str
The Phonex value
Examples
--------
>>> pe = Phonex()
>>> pe.encode('Christopher')
'C623'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Schmidt')
'S253'
>>> pe.encode('Smith')
'S530'
.. versionadded:: 0.1.0
.. versionchanged:: 0.3.6
Encapsulated in class
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:283 ==abydos.distance._levenshtein:321 )
dmat = cast( np.ndarray, self.alignment_matrix(src, tar, backtrace=False) )
if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])
def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.
The Levenshtein distance is normalized by dividing the Levenshtein
distance (calculated by any of the three supported methods) by the
greater of the number of characters in src times the cost of a delete
and the number of characters in tar times the cost of an insert.
For the case in which all operations have :math:cost = 1
, this is
equivalent to the greater of the length of the two strings src & tar.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float The normalized Levenshtein distance between src & tar
Examples
cmp = DiscountedLevenshtein() cmp.dist('cat', 'hat') 0.3513958291799864 cmp.dist('Niall', 'Neil') 0.5909885886270658 cmp.dist('aluminum', 'Catalan') 0.8348163322045603 cmp.dist('ATCG', 'TAGC') 0.7217609721523955
.. versionadded:: 0.4.1
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:219 ==abydos.distance.phoneticeditdistance:197 ) if backtrace: tracemat[i + 1, j + 1] = 2 if backtrace: return dmat, tracemat return d_mat
def dist_abs(self, src: str, tar: str) -> float: ```Return the phonetic edit distance between two strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
int (may return a float if cost has float values) The phonetic edit distance between src & tar
Examples
cmp = PhoneticEditDistance() cmp.distabs('cat', 'hat') 0.17741935483870974 cmp.distabs('Niall', 'Neil') 1.161290322580645 cmp.distabs('aluminum', 'Catalan') 2.467741935483871 cmp.distabs('ATCG', 'TAGC') 1.193548387096774
cmp = PhoneticEditDistance(mode='osa') cmp.distabs('ATCG', 'TAGC') 0.46236225806451603 cmp.distabs('ACTG', 'TAGC') 1.2580645161290323
.. versionadded:: 0.4.1
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:38 ==abydos.distance.manhattan:38 def init( self, alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersectiontype: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.
Parameters
alphabet : collection or int
The values or size of the alphabet
tokenizer : Tokenizer
A tokenizer instance from the :py:mod:abydos.tokenizer
package
intersectiontype : str
Specifies the intersection type, and set type as a result:
See :ref:intersection_type <intersection_type>
description in
:py:class:_TokenDistance
for details.
**kwargs
Arbitrary keyword arguments
Other Parameters
qval : int
The length of each q-gram. Using this parameter and tokenizer=None
will cause the instance to use the QGram tokenizer with this
q value.
metric : _Distance
A string distance measure class for use in the soft
and
fuzzy
variants.
threshold : float
A threshold value, similarities above which are counted as
members of the intersection for the fuzzy
variant.
.. versionadded:: 0.4.0
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 ==abydos.tokenizer.vccluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self._consonants:
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:135 ==abydos.phonetic.rogerroot:241 if self.zeropad: code += '0' * self.max_length # Rule 4
return code[: self.maxlength]
if name == 'main': import doctest
doctest.testmod()
Cyclic import (abydos.distance -> abydos.distance._ozbay) Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Used when a cyclic import between two or more modules is detected.
Cyclic import (abydos.distance -> abydos.distance._rouge_su) Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Used when a cyclic import between two or more modules is detected.
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.lcprefix:68 ==abydos.distance.lcsuffix:69 def dist_abs(self, src: str, tar: str, *args: str) -> int: ```Return the length of the longest common prefix of the strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison *args : strs Additional strings for comparison
Raises
ValueError All arguments must be of type str
Returns
int The length of the longest common prefix
Examples
pfx = LCPrefix() pfx.distabs('cat', 'hat') 0 pfx.distabs('Niall', 'Neil') 1 pfx.distabs('aluminum', 'Catalan') 0 pfx.distabs('ATCG', 'TAGC') 0
.. versionadded:: 0.4.0
strings = [src, tar]
for arg in args:
if isinstance(arg, str):
strings.append(arg)
else:
raise TypeError('All arguments must be of type str')
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbz2:56 ==abydos.distance.ncdlzma:55 ==abydos.distance.ncdzlib:54 super().init(**kwargs) self._level = level
def dist(self, src: str, tar: str) -> float: ```Return the NCD between two strings using LZMA compression.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Compression distance
Examples
cmp = NCDlzma() cmp.dist('cat', 'hat') 0.08695652173913043 cmp.dist('Niall', 'Neil') 0.16 cmp.dist('aluminum', 'Catalan') 0.16 cmp.dist('ATCG', 'TAGC') 0.08695652173913043
.. versionadded:: 0.3.5 .. versionchanged:: 0.3.6 Encapsulated in class
if src == tar:
return 0.0
src_b = src.encode('utf-8')
tar_b = tar.encode('utf-8')
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:42 ==abydos.tokenizer.cvcluster:43 ==abydos.tokenizer.vccluster:43 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, consonants: Optional[Set[str]] = None, vowels: Optional[Set[str]] = None, ) -> None: ```Initialize tokenizer.
Parameters
scaler : None, str, or function A scaling function for the Counter:
- None : no scaling
- 'set' : All non-zero values are set to 1.
- 'length' : Each token has weight equal to its length.
- 'length-log' : Each token has weight equal to the log of its length + 1.
- 'length-exp' : Each token has weight equal to e raised to its length.
- a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. consonants : None or set(str) The set of characters to treat as consonants vowels : None or set(str) The set of characters to treat as vowels
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.whitespace:41 ==abydos.tokenizer.wordpunct:42 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, flags: int = 0, ) -> None: ```Initialize tokenizer.
Parameters
scaler : None, str, or function A scaling function for the Counter:
- None : no scaling
- 'set' : All non-zero values are set to 1.
- 'length' : Each token has weight equal to its length.
- 'length-log' : Each token has weight equal to the log of its length + 1.
- 'length-exp' : Each token has weight equal to e raised to its length.
- a callable function : The function is applied to each value
in the Counter. Some useful functions include math.exp,
math.log1p, math.sqrt, and indexes into interesting integer
sequences such as the Fibonacci sequence.
flags : int
Flags to pass to the regular expression matcher. See the
documentation on Python's re module <https://docs.python.org/3/library/re.html#re.A>
_ for details.
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.phoneticeditdistance:107 ==abydos.phones.phones:956 if isinstance(weights, dict): weights = [ weights[feature] if feature in weights else 0 for feature in sorted( FEATUREMASK, key=FEATUREMASK.get, reverse=True ) ] elif isinstance(weights, (list, tuple)): weights = list(weights) + [0] * (len(FEATUREMASK) - len(weights))
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:285 ==abydos.distance.phoneticeditdistance:253 dmat = cast( np.ndarray, self.alignmentmatrix(src, tar, backtrace=False) )
if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])
def dist(self, src: str, tar: str) -> float: ```Return the normalized phonetic edit distance between two strings.
The edit distance is normalized by dividing the edit distance
(calculated by either of the two supported methods) by the
greater of the number of characters in src times the cost of a delete
and the number of characters in tar times the cost of an insert.
For the case in which all operations have :math:cost = 1
, this is
equivalent to the greater of the length of the two strings src & tar.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float The normalized Levenshtein distance between src & tar
Examples
cmp = PhoneticEditDistance() round(cmp.dist('cat', 'hat'), 12) 0.059139784946 round(cmp.dist('Niall', 'Neil'), 12) 0.232258064516 cmp.dist('aluminum', 'Catalan') 0.3084677419354839 cmp.dist('ATCG', 'TAGC') 0.2983870967741935
.. versionadded:: 0.4.1
if src == tar:
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:138 ==abydos.distance.dameraulevenshtein:232 if src == tar: return 0.0 inscost, delcost = self.cost[:2] return self.distabs(src, tar) / ( self.normalizer([len(src) * delcost, len(tar) * ins_cost]) )
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.phoneticeditdistance:181 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))
if self._mode == 'osa': if ( i + 1 > 1 and j + 1 > 1
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.cvcluster:133 ==abydos.tokenizer.vccluster:133 newtoken += char mode = 2 else: # This should cover combining marks, marks, etc. newtoken += char
self.orderedtokens.append(new_token)
self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self
if name == 'main': import doctest
doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:137 ==abydos.tokenizer.cvcluster:134 mode = 2 else: # This should cover combining marks, marks, etc. new_token += char
self.orderedtokens.append(new_token)
self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self
if name == 'main': import doctest
doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.levenshtein:165 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))
if self.mode == 'osa': if ( i + 1 > 1 and j + 1 > 1 and src[i] == tar[j - 1] and src[i - 1] == tar[j] ): # transposition dmat[i + 1, j + 1] = min(
Method could be a function Open
def decode(self, text: str) -> str:
- Read upRead up
- Exclude checks
Used when a method doesn't use its bound instance, and so could be written as a function.
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.prefix:32 ==abydos.distance.suffix:32 def sim(self, src: str, tar: str) -> float: ```Return the suffix similarity of two strings.
Suffix similarity is the ratio of the length of the shorter term that exactly matches the longer term to the length of the shorter term, beginning at the end of both terms.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Suffix similarity
Examples
cmp = Suffix() cmp.sim('cat', 'hat') 0.6666666666666666 cmp.sim('Niall', 'Neil') 0.25 cmp.sim('aluminum', 'Catalan') 0.0 cmp.sim('ATCG', 'TAGC') 0.0
.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class
if src == tar:
return 1.0
if not src or not tar:
return 0.0
min_word, max_word = (src, tar) if len(src) < len(tar) else (tar, src)
min_len = len(min_word)
for i in range(min_len, 0, -1):
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:125 ==abydos.distance.minkowski:163 def dist(self, src: str, tar: str) -> float: ```Return the normalized Euclidean distance between two strings.
The normalized Euclidean distance is a distance
metric in :math:L^2
-space, normalized to [0, 1].
Parameters
src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison
Returns
float The normalized Euclidean distance
Examples
cmp = Euclidean() round(cmp.dist('cat', 'hat'), 12) 0.57735026919 round(cmp.dist('Niall', 'Neil'), 12) 0.683130051064 round(cmp.dist('Colin', 'Cuilen'), 12) 0.727606875109 cmp.dist('ATCG', 'TAGC') 1.0
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class
return self.dist_abs(src, tar, normalized=True)
if __name__ == '__main__':
import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self.consonants: if mode == 2: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 1 elif char in self._vowels:
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.azzoo:21 ==abydos.distance.minkowski:21 ==abydos.distance.mutualinformation:22 from typing import ( Any, Counter as TCounter, Optional, Sequence, Set, Union, cast, )
from .tokendistance import _TokenDistance from ..tokenizer import _Tokenizer
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:143 ==abydos.stemmer.snowballswedish:143 word = word[:-3] elif _r1[-2:] == 'ig': word = word[:-2]
return word
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:40 ==abydos.distance.dameraulevenshtein:43 def init( self, cost: Tuple[float, float, float, float] = (1, 1, 1, 1), normalizer: Callable[[List[float]], float] = max, **kwargs: Any ): ```Initialize BlockLevenshtein instance.
Parameters
**kwargs Arbitrary keyword arguments
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:40 ==abydos.stemmer.snowballswedish:39 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 't', 'v', 'y',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:247 ==abydos.distance._strcmp95:265 )
return weight
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:149 ==abydos.distance.smithwaterman:64 self.gapcost = gapcost self.simfunc = cast( Callable[[str, str], float], NeedlemanWunsch.simmatrix if simfunc is None else simfunc, ) # type: Callable[[str, str], float]
def sim_score(self, src: str, tar: str) -> float: ```Return the Needleman-Wunsch score of two strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Needleman-Wunsch score
Examples
cmp = NeedlemanWunsch() cmp.simscore('cat', 'hat') 2.0 cmp.simscore('Niall', 'Neil') 1.0 cmp.simscore('aluminum', 'Catalan') -1.0 cmp.simscore('ATCG', 'TAGC') 0.0
.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class
d_mat = np_zeros((len(src) + 1, len(tar) + 1), dtype=np_float)
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:40 ==abydos.distance.manhattan:40 ==abydos.distance.minkowski:49 alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersection_type: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.
Parameters
alphabet : collection or int
The values or size of the alphabet
tokenizer : Tokenizer
A tokenizer instance from the :py:mod:abydos.tokenizer
package
intersectiontype : str
Specifies the intersection type, and set type as a result:
See :ref:intersection_type <intersection_type>
description in
:py:class:_TokenDistance
for details.
**kwargs
Arbitrary keyword arguments
Other Parameters
qval : int
The length of each q-gram. Using this parameter and tokenizer=None
will cause the instance to use the QGram tokenizer with this
q value.
metric : _Distance
A string distance measure class for use in the soft
and
fuzzy
variants.
threshold : float
A threshold value, similarities above which are counted as
members of the intersection for the fuzzy
variant.
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.haase:234 ==abydos.phonetic.koelner:143 elif word[i] == 'B': sdx += '1' elif word[i] == 'P': if _before(word, i, {'H'}): sdx += '3' else: sdx += '1' elif word[i] in {'D', 'T'}: if _before(word, i, {'C', 'S', 'Z'}): sdx += '8' else: sdx += '2' elif word[i] in {'F', 'V', 'W'}: sdx += '3' elif word[i] in {'G', 'K', 'Q'}: sdx += '4' elif word[i] == 'C': if _after(word, i, {'S', 'Z'}): sdx += '8' elif i == 0: if _before(
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:37 ==abydos.fingerprint.occurrence:36 ==abydos.fingerprint.occurrencehalved:36 def init( self, nbits: int = 16, mostcommon: Tuple[str, ...] = MOSTCOMMONLETTERS_CG, ) -> None: ```Initialize Count instance.
Parameters
nbits : int Number of bits in the fingerprint returned mostcommon : list The most common tokens in the target language, ordered by frequency
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.occurrence:141 ==abydos.fingerprint.occurrencehalved:153 if nbits > 0: fingerprint <<= n_bits
return fingerprint
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:63 ==abydos.phonetic.refinedsoundex:72 def encodealpha(self, word: str) -> str: ```Return the alphabetic LEIN code for a word.
Parameters
word : str The word to transform
Returns
str The alphabetic LEIN code
Examples
pe = LEIN() pe.encodealpha('Christopher') 'CLKT' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SKNT'
.. versionadded:: 0.4.0
code = self.encode(word).rstrip('0')
return code[:1] + code[1:].translate(self._alphabetic)
def encode(self, word: str) -> str:
```Return the LEIN code for a word.
Parameters
----------
word : str
The word to transform
Returns
-------
str
The LEIN code
Examples
--------
>>> pe = LEIN()
>>> pe.encode('Christopher')
'C351'
>>> pe.encode('Niall')
'N300'
>>> pe.encode('Smith')
'S210'
>>> pe.encode('Schmidt')
'S521'
.. versionadded:: 0.3.0
.. versionchanged:: 0.3.6
Encapsulated in class
# uppercase, normalize, decompose, and filter non-A-Z out word = unicodenormalize('NFKD', word.upper()) word = ''.join(c for c in word if c in self.uc_set)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.pshpsoundexfirst:203 ==abydos.phonetic.pshpsoundexlast:239 code = code.replace('0', '') # rule 1
if self.maxlength != -1: if len(code) < self.maxlength: code += '0' * (self.maxlength - len(code)) else: code = code[: self.maxlength]
return code
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:393 ==abydos.distance.phoneticeditdistance:307 normalizeterm = self.normalizer( [srclen * delcost, tarlen * inscost] )
return self.distabs(src, tar) / normalizeterm
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:81 ==abydos.distance.manhattan:81 alphabet=alphabet, tokenizer=tokenizer, intersectiontype=intersectiontype, **kwargs )
def dist_abs(self, src: str, tar: str, normalized: bool = False) -> float: ```Return the Euclidean distance between two strings.
Parameters
src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison normalized : bool Normalizes to [0, 1] if True
Returns
float The Euclidean distance
Examples
cmp = Euclidean() cmp.distabs('cat', 'hat') 2.0 round(cmp.distabs('Niall', 'Neil'), 12) 2.645751311065 cmp.distabs('Colin', 'Cuilen') 3.0 round(cmp.distabs('ATCG', 'TAGC'), 12) 3.162277660168
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.gotoh:67 ==abydos.distance.needlemanwunsch:150 ==abydos.distance.smithwaterman:65 self.simfunc = cast( Callable[[str, str], float], NeedlemanWunsch.simmatrix if simfunc is None else simfunc, ) # type: Callable[[str, str], float]
def sim_score(self, src: str, tar: str) -> float: ```Return the Gotoh score of two strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Gotoh score
Examples
cmp = Gotoh() cmp.simscore('cat', 'hat') 2.0 cmp.simscore('Niall', 'Neil') 1.0 round(cmp.simscore('aluminum', 'Catalan'), 12) -0.4 cmp.simscore('cat', 'hat') 2.0
.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class
d_mat = np_zeros((len(src) + 1, len(tar) + 1), dtype=np_float)
Similar lines in 5 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbwtrle:80 ==abydos.distance.ncdbz2:103 ==abydos.distance.ncdlzma:102 ==abydos.distance.ncdlzss:90 ==abydos.distance.ncdrle:80 return ( min(len(concatcomp), len(concatcomp2)) - min(len(srccomp), len(tarcomp)) ) / max(len(srccomp), len(tarcomp))
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:198 ==abydos.distance.strcmp95:200 numcom += 1 break
# If no characters in common - return if num_com == 0: return 0.0
# Count the number of transpositions k = n_trans = 0
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:145 ==abydos.fingerprint.occurrence:142 ==abydos.fingerprint.occurrencehalved:154 fingerprint <<= n_bits
return fingerprint
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.qgrams:167 ==abydos.tokenizer.qskipgrams:183 string = ( self.startstop[0] * (qvali - 1) + self.string + self.startstop[-1] * (qvali - 1) ) else: string = self.string
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:194 ==abydos.distance.smithwaterman:105 for i in range(1, len(src) + 1): for j in range(1, len(tar) + 1): match = dmat[i - 1, j - 1] + self.simfunc( src[i - 1], tar[j - 1] ) delete = dmat[i - 1, j] - self.gapcost insert = dmat[i, j - 1] - self.gap_cost
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:95 ==abydos.phonetic._phonic:45 'D': '1', 'T': '1', 'N': '2', 'M': '3', 'R': '4', 'L': '5', 'J': '6',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.daitchmokotoff:363 ==abydos.phonetic.soundex:206 word = ''.join(c for c in word if c in self.uc_set)
# Nothing to convert, return base case if not word: if self.zeropad: return '0' * self.maxlength return '0'
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:149 ==abydos.stemmer.porter:204 if word[0] == 'y': word = 'Y' + word[1:] for i in range(1, len(word)): if word[i] == 'y' and word[i - 1] in self._vowels: word = word[:i] + 'Y' + word[i + 1 :]
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.soundd:36 ==abydos.phonetic.soundexbr:36 trans = dict( zip( (ord() for _ in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), '01230120022455012623010202', ) )
alphabetic = dict(zip((ord() for _ in '0123456'), 'APKTLNR'))
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.util.init:34 ==abydos.util.data:37 _all__ = [ 'datapath', 'downloadpackage', 'listavailablepackages', 'listinstalledpackages', 'package_path', ]
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.clefgerman:84 ==abydos.stemmer.sstemmer:73 return word[:-1] return word
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.schinke:217 ==abydos.stemmer.snowball_norwegian:46 'l', 'm', 'n', 'o', 'p',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.rethschek:182 ==abydos.stemmer.snowballdanish:167 word = word[:-1]
return word
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.prefix:77 ==abydos.distance.suffix:77 return i / min_len return 0.0
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:218 ==abydos.stemmer.porter:241 word = word[:-3] step1b_flag = True
if step1b_flag: if word[-2:] in {'at', 'bl', 'iz'}: word += 'e'
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.haase:179 ==abydos.phonetic.koelner:129 word = unicode_normalize('NFKD', word.upper())
word = word.replace('Ä', 'AE') word = word.replace('Ö', 'OE') word = word.replace('Ü', 'UE') word = ''.join(c for c in word if c in self.ucset)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:239 ==abydos.distance.strcmp95:256 if ( self.longstrings and (minv > 4) and (numcom > i + 1) and (2 * num_com >= minv + i) ):
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.synoname:27 ==abydos.distance.token_distance:29 Optional, Tuple, Union, cast, )
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.flexmetric:31 ==abydos.distance.meta_levenshtein:30 cast, )
from numpy import float_ as npfloat from numpy import zeros as npzeros
from ._distance import _Distance
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.koelner:225 ==abydos.phonetic.russellindex:135 num = ''.join(c for c in self.encode(word) if c in self.numset) return num.translate(self.num_trans)
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.dameraulevenshtein:113 ==abydos.distance.shapirastoreri:152 if src == tar: return 0 if not src: return len(tar) * inscost if not tar: return len(src) * del_cost
Similar lines in 4 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.character:79 ==abydos.tokenizer.legalipy:139 ==abydos.tokenizer.regexp:99 ==abydos.tokenizer.sonoripy:104 self.scaleand_counterize() return self
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:372 ==abydos.distance.phoneticeditdistance:300 if src == tar: return 0.0 inscost, delcost = self._cost[:2]
srclen = len(src) tarlen = len(tar)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:102 ==abydos.phonetic._phonic:54 'G': '7', 'Q': '7', 'X': '7', 'F': '8', 'V': '8', 'B': '9', 'P': '9',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.init:102 ==abydos.fingerprint.fingerprint:86 'MOSTCOMMONLETTERS', 'MOSTCOMMONLETTERSCG', 'MOSTCOMMONLETTERSDE', 'MOSTCOMMONLETTERSDELC', 'MOSTCOMMONLETTERSEN_LC',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:38 ==abydos.stemmer.snowballswedish:38 sendings = { 'b', 'c', 'd', 'f', 'g', 'h', 'j',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdutch:177 ==abydos.stemmer.snowballgerman:175 if len(word[r2start:]) >= 3: word = word[:-3] if ( word[-2:] == 'ig' and len(word[r2start:]) >= 2 and word[-3] != 'e' ): word = word[:-2]
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.schinke:215 ==abydos.stemmer.snowballdanish:46 ==abydos.stemmer.snowball_swedish:45 'j', 'k', 'l', 'm', 'n', 'o', 'p',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.editex:293 ==abydos.distance.levenshtein:388 for pos in range(tar_len) ), ] ) else:
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:39 ==abydos.phonetic.rogerroot:46 'GF': '08', 'GM': '03', 'GN': '02', 'KN': '02', 'PF': '08',
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:40 ==abydos.stemmer.snowballnorwegian:39 'b', 'c', 'd', 'f', 'g', 'h', 'j',
Similar lines in 7 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:147 ==abydos.tokenizer.cvcluster:144 ==abydos.tokenizer.nltk:104 ==abydos.tokenizer.qgrams:189 ==abydos.tokenizer.qskipgrams:213 ==abydos.tokenizer.saps:114 ==abydos.tokenizer.vccluster:144 self.scaleand_counterize() return self
if name == 'main': import doctest
doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.inclusion:110 ==abydos.distance.mlipns:119 return 1.0 return 0.0
if name == 'main': import doctest
doctest.testmod()
Similar lines in 4 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.azzoo:24 ==abydos.distance.generalizedfleiss:27 ==abydos.distance.minkowski:24 ==abydos.distance.mutualinformation:25 Optional, Sequence, Set, Union, cast, )
from .tokendistance import _TokenDistance
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.phoneticeditdistance:29 ==abydos.distance.token_distance:30 Tuple, Union, cast, )
import numpy as np
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.sift4:138 ==abydos.distance.sift4simplest:109 for i in range(self.maxoffset): if not ( (srccur + i < srclen) or (tarcur + i < tar_len) ): break
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.henryearly:245 ==abydos.phonetic.pshpsoundexfirst:209 ==abydos.phonetic.pshpsoundexlast:245 code = code[: self.maxlength]
return code
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:46 ==abydos.stemmer.snowballswedish:47 'l', 'm', 'n', 'o', 'p', 'r', 't', 'v', 'y',
Similar lines in 4 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.averagelinkage:20 ==abydos.distance.completelinkage:21 ==abydos.distance.singlelinkage:21 ==abydos.distance.softcosine:21 from typing import Any, Optional, cast
from .distance import _Distance from .levenshtein import Levenshtein from .tokendistance import _TokenDistance from ..tokenizer import _Tokenizer
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.clefgermanplus:97 ==abydos.stemmer.snowballnorwegian:145 ==abydos.stemmer.snowball_swedish:145 word = word[:-2]
return word
if name == 'main': import doctest
doctest.testmod()