chrislit/abydos

View on GitHub
abydos/compression/_rle.py

Summary

Maintainability
A
1 hr
Test Coverage

Cyclomatic complexity is too high in method encode. (6)
Open

    def encode(self, text: str) -> str:
        r"""Perform encoding of run-length-encoding (RLE).

        Parameters
        ----------
Severity: Minor
Found in abydos/compression/_rle.py by radon

Cyclomatic Complexity

Cyclomatic Complexity corresponds to the number of decisions a block of code contains plus 1. This number (also called McCabe number) is equal to the number of linearly independent paths through the code. This number can be used as a guide when testing conditional logic in blocks.

Radon analyzes the AST tree of a Python program to compute Cyclomatic Complexity. Statements have the following effects on Cyclomatic Complexity:

Construct Effect on CC Reasoning
if +1 An if statement is a single decision.
elif +1 The elif statement adds another decision.
else +0 The else statement does not cause a new decision. The decision is at the if.
for +1 There is a decision at the start of the loop.
while +1 There is a decision at the while statement.
except +1 Each except branch adds a new conditional path of execution.
finally +0 The finally block is unconditionally executed.
with +1 The with statement roughly corresponds to a try/except block (see PEP 343 for details).
assert +1 The assert statement internally roughly equals a conditional statement.
Comprehension +1 A list/set/dict comprehension of generator expression is equivalent to a for loop.
Boolean Operator +1 Every boolean operator (and, or) adds a decision point.

Source: http://radon.readthedocs.org/en/latest/intro.html

Function decode has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
Open

    def decode(self, text: str) -> str:
        r"""Perform decoding of run-length-encoding (RLE).

        Parameters
        ----------
Severity: Minor
Found in abydos/compression/_rle.py - About 45 mins to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function encode has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.
Open

    def encode(self, text: str) -> str:
        r"""Perform encoding of run-length-encoding (RLE).

        Parameters
        ----------
Severity: Minor
Found in abydos/compression/_rle.py - About 25 mins to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Cyclic import (abydos.distance -> abydos.distance._rouge_l)
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a cyclic import between two or more modules is detected.

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.sift4:59 ==abydos.distance.sift4simplest:54 def distabs(self, src: str, tar: str) -> float: ``Return thecommon` Sift4 distance between two terms.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


int The Sift4 distance according to the common formula

Examples


cmp = Sift4() cmp.distabs('cat', 'hat') 1 cmp.distabs('Niall', 'Neil') 2 cmp.distabs('Colin', 'Cuilen') 3 cmp.distabs('ATCG', 'TAGC') 2

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

if not src:
 return len(tar)

 if not tar:
 return len(src)

 src_len = len(src)
 tar_len = len(tar)

 src_cur = 0
 tar_cur = 0
 lcss = 0
 local_cs = 0

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:59 ==abydos.stemmer.snowballnorwegian:56 ==abydos.stemmer.snowballswedish:56 }

def stem(self, word: str) -> str: ```Return Snowball Swedish stem.

Parameters


word : str The word to stem

Returns


str Word stem

Examples


stmr = SnowballSwedish() stmr.stem('undervisa') 'undervis' stmr.stem('suspension') 'suspension' stmr.stem('visshet') 'viss'

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

# lowercase, normalize, and compose
 word = normalize('NFC', word.lower())

 r1_start = min(max(3, self._sb_r1(word)), len(word))

 # Step 1
 _r1 = word[r1_start:]

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:163 ==abydos.distance.phoneticeditdistance:179 else 0 ), # sub/== ) dmat[i + 1, j + 1] = min(opts) if backtrace: tracemat[i + 1, j + 1] = int(np.argmin(opts))

if self._mode == 'osa': if ( i + 1 > 1 and j + 1 > 1

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:61 ==abydos.phonetic.phonix:190 self.zeropad = zero_pad

def encode_alpha(self, word: str) -> str: ```Return the alphabetic LEIN code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic LEIN code

Examples


pe = LEIN() pe.encodealpha('Christopher') 'CLKT' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SKNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the LEIN code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The LEIN code

 Examples
 --------
 >>> pe = LEIN()
 >>> pe.encode('Christopher')
 'C351'
 >>> pe.encode('Niall')
 'N300'
 >>> pe.encode('Smith')
 'S210'
 >>> pe.encode('Schmidt')
 'S521'


 .. versionadded:: 0.3.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

# uppercase, normalize, decompose, and filter non-A-Z out

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:193 ==abydos.phonetic.soundex:231 ==abydos.phonetic.soundex_br:155 sdx = sdx.replace('0', '') # rule 1

if self.zeropad: sdx += '0' * self.maxlength # rule 4

return sdx[: self.maxlength]

if name == 'main': import doctest

doctest.testmod()

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:76 ==abydos.tokenizer.cvcluster:77 ==abydos.tokenizer.vccluster:77 if consonants: self.consonants = consonants else: self.consonants = set('bcdfghjklmnpqrstvwxzßBCDFGHJKLMNPQRSTVWXZ') if vowels: self.vowels = vowels else: self.vowels = set('aeiouyAEIOUY') self._regexp = re.compile(r'w+|[^ws]+', flags=0)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:323 ==abydos.distance.phoneticeditdistance:253 dmat = cast( np.ndarray, self.alignment_matrix(src, tar, backtrace=False) )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized phonetic edit distance between two strings.

The edit distance is normalized by dividing the edit distance (calculated by either of the two supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = PhoneticEditDistance() round(cmp.dist('cat', 'hat'), 12) 0.059139784946 round(cmp.dist('Niall', 'Neil'), 12) 0.232258064516 cmp.dist('aluminum', 'Catalan') 0.3084677419354839 cmp.dist('ATCG', 'TAGC') 0.2983870967741935

.. versionadded:: 0.4.1

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.gotoh:147 ==abydos.distance.needlemanwunsch:204 ==abydos.distance.smith_waterman:115 def sim(self, src: str, tar: str) -> float: ```Return the normalized Needleman-Wunsch score of two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Normalized Needleman-Wunsch score

Examples


cmp = NeedlemanWunsch() cmp.sim('cat', 'hat') 0.6666666666666667 cmp.sim('Niall', 'Neil') 0.22360679774997896 round(cmp.sim('aluminum', 'Catalan'), 12) 0.0 cmp.sim('cat', 'hat') 0.6666666666666667

.. versionadded:: 0.4.1

if src == tar:
 return 1.0
 return max(0.0, self.sim_score(src, tar)) / (
 self.sim_score(src, src) ** 0.5 * self.sim_score(tar, tar) ** 0.5
 )


if __name__ == '__main__':
 import doctest

 doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:56 ==abydos.fingerprint.occurrence:55 self.nbits = nbits self.mostcommon = mostcommon

def fingerprint(self, word: str) -> str: ```Return the occurrence fingerprint.

Parameters


word : str The word to fingerprint

Returns


str The occurrence fingerprint

Examples


of = Occurrence() of.fingerprint('hat') '0110000100000000' of.fingerprint('niall') '0010110000100000' of.fingerprint('colin') '0001110000110000' of.fingerprint('atcg') '0110000000010000' of.fingerprint('entreatment') '1110010010000100'

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class .. versionchanged:: 0.6.0 Changed to return a str and added fingerprint_int method

return ('{:0' + str(self._n_bits) + 'b}').format(
 self.fingerprint_int(word)
 )

 def fingerprint_int(self, word: str) -> int:
 ```Return the occurrence fingerprint.

 Parameters
 ----------
 word : str
 The word to fingerprint

 Returns
 -------
 int
 The occurrence fingerprint as an int

 Examples
 --------
 >>> of = Occurrence()
 >>> of.fingerprint_int('hat')
 24832
 >>> of.fingerprint_int('niall')
 11296
 >>> of.fingerprint_int('colin')
 7216
 >>> of.fingerprint_int('atcg')
 24592
 >>> of.fingerprint_int('entreatment')
 58500


 .. versionadded:: 0.6.0

nbits = self.n_bits

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.pshpsoundexfirst:41 ==abydos.phonetic.pshpsoundexlast:41 trans = dict( zip( (ord() for _ in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), '01230120022455012523010202', ) )

alphabetic = dict(zip((ord() for _ in '12345'), 'PKTLN'))

def init(self, max_length: int = 4, german: bool = False) -> None: ```Initialize PSHPSoundexFirst instance.

Parameters


max_length : int The length of the code returned (defaults to 4) german : bool Set to True if the name is German (different rules apply)

.. versionadded:: 0.4.0

self._max_length = max_length
 self._german = german

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.haase:258 ==abydos.phonetic.koelner:165 ): sdx += '4' else: sdx += '8' elif _before(word, i, {'A', 'H', 'K', 'O', 'Q', 'U', 'X'}): sdx += '4' else: sdx += '8' elif word[i] == 'X': if _after(word, i, {'C', 'K', 'Q'}): sdx += '8' else: sdx += '48' elif word[i] == 'L': sdx += '5' elif word[i] in {'M', 'N'}: sdx += '6' elif word[i] == 'R': sdx += '7' elif word[i] in {'S', 'Z'}: sdx += '8'

sdx = self.deleteconsecutive_repeats(sdx)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.legalipy:136 ==abydos.tokenizer.sonoripy:101 if not self.orderedtokens: self.orderedtokens = [self._string]

self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:48 ==abydos.stemmer.snowballnorwegian:46 'l', 'm', 'n', 'o', 'p', 'r', 't', 'v', 'y', 'z',

Method could be a function
Open

    def encode(self, text: str) -> str:
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a method doesn't use its bound instance, and so could be written as a function.

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:59 ==abydos.fingerprint.occurrence:58 ==abydos.fingerprint._position:60 def fingerprint(self, word: str) -> str: ```Return the position fingerprint.

Parameters


word : str The word to fingerprint

Returns


str The position fingerprint

Examples


pf = Position() pf.fingerprint('hat') '1110100011111111' pf.fingerprint('niall') '1111110101110010' pf.fingerprint('colin') '1111111110010111' pf.fingerprint('atcg') '1110010001111111' pf.fingerprint('entreatment') '0000101011111111'

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class .. versionchanged:: 0.6.0 Changed to return a str and added fingerprint_int method

return ('{:0' + str(self._n_bits) + 'b}').format(
 self.fingerprint_int(word)
 )

 def fingerprint_int(self, word: str) -> int:
 ```Return the position fingerprint.

 Parameters
 ----------
 word : str
 The word to fingerprint

 Returns
 -------
 int
 The position fingerprint as an int

 Examples
 --------
 >>> pf = Position()
 >>> pf.fingerprint_int('hat')
 59647
 >>> pf.fingerprint_int('niall')
 64882
 >>> pf.fingerprint_int('colin')
 65431
 >>> pf.fingerprint_int('atcg')
 58495
 >>> pf.fingerprint_int('entreatment')
 2815


 .. versionadded:: 0.6.0

nbits = self.n_bits

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.editex:230 ==abydos.distance.levenshtein:325 )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.

The Levenshtein distance is normalized by dividing the Levenshtein distance (calculated by either of the two supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = Levenshtein() round(cmp.dist('cat', 'hat'), 12) 0.333333333333 round(cmp.dist('Niall', 'Neil'), 12) 0.6 cmp.dist('aluminum', 'Catalan') 0.875 cmp.dist('ATCG', 'TAGC') 0.75

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

if src == tar:
 return 0.0

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:287 ==abydos.distance.editex:230 ==abydos.distance.phoneticeditdistance:255 )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.

The Levenshtein distance is normalized by dividing the Levenshtein distance (calculated by any of the three supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = DiscountedLevenshtein() cmp.dist('cat', 'hat') 0.3513958291799864 cmp.dist('Niall', 'Neil') 0.5909885886270658 cmp.dist('aluminum', 'Catalan') 0.8348163322045603 cmp.dist('ATCG', 'TAGC') 0.7217609721523955

.. versionadded:: 0.4.1

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:202 ==abydos.distance.smithwaterman:113 return cast(float, dmat[dmat.shape[0] - 1, d_mat.shape[1] - 1])

def sim(self, src: str, tar: str) -> float: ```Return the normalized Needleman-Wunsch score of two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Normalized Needleman-Wunsch score

Examples


cmp = NeedlemanWunsch() cmp.sim('cat', 'hat') 0.6666666666666667 cmp.sim('Niall', 'Neil') 0.22360679774997896 round(cmp.sim('aluminum', 'Catalan'), 12) 0.0 cmp.sim('cat', 'hat') 0.6666666666666667

.. versionadded:: 0.4.1

if src == tar:
 return 1.0
 return max(0.0, self.sim_score(src, tar)) / (
 self.sim_score(src, src) ** 0.5 * self.sim_score(tar, tar) ** 0.5
 )


if __name__ == '__main__':
 import doctest

 doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:111 ==abydos.distance.phoneticeditdistance:118 def alignmentmatrix( self, src: str, tar: str, backtrace: bool = True ) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]: ```Return the Levenshtein alignment matrix.

Parameters


src : str Source string for comparison tar : str Target string for comparison backtrace : bool Return the backtrace matrix as well

Returns


numpy.ndarray or tuple(numpy.ndarray, numpy.ndarray) The alignment matrix and (optionally) the backtrace matrix

.. versionadded:: 0.4.1

ins_cost, del_cost, sub_cost, trans_cost = self._cost

 src_len = len(src)
 tar_len = len(tar)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:130 ==abydos.tokenizer.vccluster:127 mode = 1 elif char in self.vowels: if mode == 1: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 2 else: # This should cover combining marks, marks, etc. new_token += char

self.orderedtokens.append(new_token)

self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.shapirastoreri:160 ==abydos.distance.typo:324 for i in range(len(src) + 1): dmat[i, 0] = i * delcost for j in range(len(tar) + 1): dmat[0, j] = j * inscost

for i in range(len(src)): for j in range(len(tar)): dmat[i + 1, j + 1] = min( dmat[i + 1, j] + inscost, # ins dmat[i, j + 1] + delcost, # del dmat[i, j]

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:381 ==abydos.stemmer.porter:395 if word[i] == 'Y': word = word[:i] + 'y' + word[i + 1 :]

return word

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:62 ==abydos.phonetic.phonex:54 if maxlength != -1: self.maxlength = min(max(4, maxlength), 64) else: self.maxlength = 64 self.zeropad = zeropad

def encode_alpha(self, word: str) -> str: ```Return the alphabetic Fuzzy Soundex code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic Fuzzy Soundex value

Examples


pe = FuzzySoundex() pe.encodealpha('Christopher') 'KRSTP' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the Fuzzy Soundex code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The Fuzzy Soundex value

 Examples
 --------
 >>> pe = FuzzySoundex()
 >>> pe.encode('Christopher')
 'K6931'
 >>> pe.encode('Niall')
 'N4000'
 >>> pe.encode('Smith')
 'S5300'
 >>> pe.encode('Smith')
 'S5300'


 .. versionadded:: 0.1.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.chebyshev:21 ==abydos.distance.unknown_f:22 from typing import ( Any, Counter as TCounter, NoReturn, Optional, Sequence, Set, Union, )

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:354 ==abydos.distance.phoneticedit_distance:309 )

return self.distabs(src, tar) / normalizeterm

if name == 'main': import doctest

doctest.testmod()

Similar lines in 4 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:66 ==abydos.phonetic.lein:61 ==abydos.phonetic.phonex:58 ==abydos.phonetic.phonix:190 self.zeropad = zeropad

def encode_alpha(self, word: str) -> str: ```Return the alphabetic Phonex code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic Phonex value

Examples


pe = Phonex() pe.encodealpha('Christopher') 'CRST' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SSNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the Phonex code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The Phonex value

 Examples
 --------
 >>> pe = Phonex()
 >>> pe.encode('Christopher')
 'C623'
 >>> pe.encode('Niall')
 'N400'
 >>> pe.encode('Schmidt')
 'S253'
 >>> pe.encode('Smith')
 'S530'


 .. versionadded:: 0.1.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:283 ==abydos.distance._levenshtein:321 )

dmat = cast( np.ndarray, self.alignment_matrix(src, tar, backtrace=False) )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.

The Levenshtein distance is normalized by dividing the Levenshtein distance (calculated by any of the three supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = DiscountedLevenshtein() cmp.dist('cat', 'hat') 0.3513958291799864 cmp.dist('Niall', 'Neil') 0.5909885886270658 cmp.dist('aluminum', 'Catalan') 0.8348163322045603 cmp.dist('ATCG', 'TAGC') 0.7217609721523955

.. versionadded:: 0.4.1

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:219 ==abydos.distance.phoneticeditdistance:197 ) if backtrace: tracemat[i + 1, j + 1] = 2 if backtrace: return dmat, tracemat return d_mat

def dist_abs(self, src: str, tar: str) -> float: ```Return the phonetic edit distance between two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


int (may return a float if cost has float values) The phonetic edit distance between src & tar

Examples


cmp = PhoneticEditDistance() cmp.distabs('cat', 'hat') 0.17741935483870974 cmp.distabs('Niall', 'Neil') 1.161290322580645 cmp.distabs('aluminum', 'Catalan') 2.467741935483871 cmp.distabs('ATCG', 'TAGC') 1.193548387096774

cmp = PhoneticEditDistance(mode='osa') cmp.distabs('ATCG', 'TAGC') 0.46236225806451603 cmp.distabs('ACTG', 'TAGC') 1.2580645161290323

.. versionadded:: 0.4.1

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:38 ==abydos.distance.manhattan:38 def init( self, alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersectiontype: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.

Parameters


alphabet : collection or int The values or size of the alphabet tokenizer : Tokenizer A tokenizer instance from the :py:mod:abydos.tokenizer package intersectiontype : str Specifies the intersection type, and set type as a result: See :ref:intersection_type <intersection_type> description in :py:class:_TokenDistance for details. **kwargs Arbitrary keyword arguments

Other Parameters


qval : int The length of each q-gram. Using this parameter and tokenizer=None will cause the instance to use the QGram tokenizer with this q value. metric : _Distance A string distance measure class for use in the soft and fuzzy variants. threshold : float A threshold value, similarities above which are counted as members of the intersection for the fuzzy variant.

.. versionadded:: 0.4.0

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 ==abydos.tokenizer.vccluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self._consonants:

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:135 ==abydos.phonetic.rogerroot:241 if self.zeropad: code += '0' * self.max_length # Rule 4

return code[: self.maxlength]

if name == 'main': import doctest

doctest.testmod()

Cyclic import (abydos.distance -> abydos.distance._ozbay)
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a cyclic import between two or more modules is detected.

Cyclic import (abydos.distance -> abydos.distance._rouge_su)
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a cyclic import between two or more modules is detected.

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.lcprefix:68 ==abydos.distance.lcsuffix:69 def dist_abs(self, src: str, tar: str, *args: str) -> int: ```Return the length of the longest common prefix of the strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison *args : strs Additional strings for comparison

Raises


ValueError All arguments must be of type str

Returns


int The length of the longest common prefix

Examples


pfx = LCPrefix() pfx.distabs('cat', 'hat') 0 pfx.distabs('Niall', 'Neil') 1 pfx.distabs('aluminum', 'Catalan') 0 pfx.distabs('ATCG', 'TAGC') 0

.. versionadded:: 0.4.0

strings = [src, tar]
 for arg in args:
 if isinstance(arg, str):
 strings.append(arg)
 else:
 raise TypeError('All arguments must be of type str')

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbz2:56 ==abydos.distance.ncdlzma:55 ==abydos.distance.ncdzlib:54 super().init(**kwargs) self._level = level

def dist(self, src: str, tar: str) -> float: ```Return the NCD between two strings using LZMA compression.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Compression distance

Examples


cmp = NCDlzma() cmp.dist('cat', 'hat') 0.08695652173913043 cmp.dist('Niall', 'Neil') 0.16 cmp.dist('aluminum', 'Catalan') 0.16 cmp.dist('ATCG', 'TAGC') 0.08695652173913043

.. versionadded:: 0.3.5 .. versionchanged:: 0.3.6 Encapsulated in class

if src == tar:
 return 0.0

 src_b = src.encode('utf-8')
 tar_b = tar.encode('utf-8')

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:42 ==abydos.tokenizer.cvcluster:43 ==abydos.tokenizer.vccluster:43 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, consonants: Optional[Set[str]] = None, vowels: Optional[Set[str]] = None, ) -> None: ```Initialize tokenizer.

Parameters


scaler : None, str, or function A scaling function for the Counter:

  • None : no scaling
  • 'set' : All non-zero values are set to 1.
  • 'length' : Each token has weight equal to its length.
  • 'length-log' : Each token has weight equal to the log of its length + 1.
  • 'length-exp' : Each token has weight equal to e raised to its length.
  • a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. consonants : None or set(str) The set of characters to treat as consonants vowels : None or set(str) The set of characters to treat as vowels

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.whitespace:41 ==abydos.tokenizer.wordpunct:42 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, flags: int = 0, ) -> None: ```Initialize tokenizer.

Parameters


scaler : None, str, or function A scaling function for the Counter:

  • None : no scaling
  • 'set' : All non-zero values are set to 1.
  • 'length' : Each token has weight equal to its length.
  • 'length-log' : Each token has weight equal to the log of its length + 1.
  • 'length-exp' : Each token has weight equal to e raised to its length.
  • a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. flags : int Flags to pass to the regular expression matcher. See the documentation on Python's re module <https://docs.python.org/3/library/re.html#re.A>_ for details.

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.phoneticeditdistance:107 ==abydos.phones.phones:956 if isinstance(weights, dict): weights = [ weights[feature] if feature in weights else 0 for feature in sorted( FEATUREMASK, key=FEATUREMASK.get, reverse=True ) ] elif isinstance(weights, (list, tuple)): weights = list(weights) + [0] * (len(FEATUREMASK) - len(weights))

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:285 ==abydos.distance.phoneticeditdistance:253 dmat = cast( np.ndarray, self.alignmentmatrix(src, tar, backtrace=False) )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized phonetic edit distance between two strings.

The edit distance is normalized by dividing the edit distance (calculated by either of the two supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = PhoneticEditDistance() round(cmp.dist('cat', 'hat'), 12) 0.059139784946 round(cmp.dist('Niall', 'Neil'), 12) 0.232258064516 cmp.dist('aluminum', 'Catalan') 0.3084677419354839 cmp.dist('ATCG', 'TAGC') 0.2983870967741935

.. versionadded:: 0.4.1

if src == tar:

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:138 ==abydos.distance.dameraulevenshtein:232 if src == tar: return 0.0 inscost, delcost = self.cost[:2] return self.distabs(src, tar) / ( self.normalizer([len(src) * delcost, len(tar) * ins_cost]) )

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.phoneticeditdistance:181 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))

if self._mode == 'osa': if ( i + 1 > 1 and j + 1 > 1

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.cvcluster:133 ==abydos.tokenizer.vccluster:133 newtoken += char mode = 2 else: # This should cover combining marks, marks, etc. newtoken += char

self.orderedtokens.append(new_token)

self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:137 ==abydos.tokenizer.cvcluster:134 mode = 2 else: # This should cover combining marks, marks, etc. new_token += char

self.orderedtokens.append(new_token)

self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.levenshtein:165 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))

if self.mode == 'osa': if ( i + 1 > 1 and j + 1 > 1 and src[i] == tar[j - 1] and src[i - 1] == tar[j] ): # transposition dmat[i + 1, j + 1] = min(

Method could be a function
Open

    def decode(self, text: str) -> str:
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a method doesn't use its bound instance, and so could be written as a function.

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.prefix:32 ==abydos.distance.suffix:32 def sim(self, src: str, tar: str) -> float: ```Return the suffix similarity of two strings.

Suffix similarity is the ratio of the length of the shorter term that exactly matches the longer term to the length of the shorter term, beginning at the end of both terms.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Suffix similarity

Examples


cmp = Suffix() cmp.sim('cat', 'hat') 0.6666666666666666 cmp.sim('Niall', 'Neil') 0.25 cmp.sim('aluminum', 'Catalan') 0.0 cmp.sim('ATCG', 'TAGC') 0.0

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

if src == tar:
 return 1.0
 if not src or not tar:
 return 0.0
 min_word, max_word = (src, tar) if len(src) < len(tar) else (tar, src)
 min_len = len(min_word)
 for i in range(min_len, 0, -1):

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:125 ==abydos.distance.minkowski:163 def dist(self, src: str, tar: str) -> float: ```Return the normalized Euclidean distance between two strings.

The normalized Euclidean distance is a distance metric in :math:L^2-space, normalized to [0, 1].

Parameters


src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison

Returns


float The normalized Euclidean distance

Examples


cmp = Euclidean() round(cmp.dist('cat', 'hat'), 12) 0.57735026919 round(cmp.dist('Niall', 'Neil'), 12) 0.683130051064 round(cmp.dist('Colin', 'Cuilen'), 12) 0.727606875109 cmp.dist('ATCG', 'TAGC') 1.0

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

return self.dist_abs(src, tar, normalized=True)


if __name__ == '__main__':
 import doctest

 doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self.consonants: if mode == 2: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 1 elif char in self._vowels:

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.azzoo:21 ==abydos.distance.minkowski:21 ==abydos.distance.mutualinformation:22 from typing import ( Any, Counter as TCounter, Optional, Sequence, Set, Union, cast, )

from .tokendistance import _TokenDistance from ..tokenizer import _Tokenizer

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:143 ==abydos.stemmer.snowballswedish:143 word = word[:-3] elif _r1[-2:] == 'ig': word = word[:-2]

return word

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:40 ==abydos.distance.dameraulevenshtein:43 def init( self, cost: Tuple[float, float, float, float] = (1, 1, 1, 1), normalizer: Callable[[List[float]], float] = max, **kwargs: Any ): ```Initialize BlockLevenshtein instance.

Parameters


**kwargs Arbitrary keyword arguments

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:40 ==abydos.stemmer.snowballswedish:39 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 't', 'v', 'y',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:247 ==abydos.distance._strcmp95:265 )

return weight

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:149 ==abydos.distance.smithwaterman:64 self.gapcost = gapcost self.simfunc = cast( Callable[[str, str], float], NeedlemanWunsch.simmatrix if simfunc is None else simfunc, ) # type: Callable[[str, str], float]

def sim_score(self, src: str, tar: str) -> float: ```Return the Needleman-Wunsch score of two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Needleman-Wunsch score

Examples


cmp = NeedlemanWunsch() cmp.simscore('cat', 'hat') 2.0 cmp.simscore('Niall', 'Neil') 1.0 cmp.simscore('aluminum', 'Catalan') -1.0 cmp.simscore('ATCG', 'TAGC') 0.0

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

d_mat = np_zeros((len(src) + 1, len(tar) + 1), dtype=np_float)

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:40 ==abydos.distance.manhattan:40 ==abydos.distance.minkowski:49 alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersection_type: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.

Parameters


alphabet : collection or int The values or size of the alphabet tokenizer : Tokenizer A tokenizer instance from the :py:mod:abydos.tokenizer package intersectiontype : str Specifies the intersection type, and set type as a result: See :ref:intersection_type <intersection_type> description in :py:class:_TokenDistance for details. **kwargs Arbitrary keyword arguments

Other Parameters


qval : int The length of each q-gram. Using this parameter and tokenizer=None will cause the instance to use the QGram tokenizer with this q value. metric : _Distance A string distance measure class for use in the soft and fuzzy variants. threshold : float A threshold value, similarities above which are counted as members of the intersection for the fuzzy variant.

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.haase:234 ==abydos.phonetic.koelner:143 elif word[i] == 'B': sdx += '1' elif word[i] == 'P': if _before(word, i, {'H'}): sdx += '3' else: sdx += '1' elif word[i] in {'D', 'T'}: if _before(word, i, {'C', 'S', 'Z'}): sdx += '8' else: sdx += '2' elif word[i] in {'F', 'V', 'W'}: sdx += '3' elif word[i] in {'G', 'K', 'Q'}: sdx += '4' elif word[i] == 'C': if _after(word, i, {'S', 'Z'}): sdx += '8' elif i == 0: if _before(

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:37 ==abydos.fingerprint.occurrence:36 ==abydos.fingerprint.occurrencehalved:36 def init( self, nbits: int = 16, mostcommon: Tuple[str, ...] = MOSTCOMMONLETTERS_CG, ) -> None: ```Initialize Count instance.

Parameters


nbits : int Number of bits in the fingerprint returned mostcommon : list The most common tokens in the target language, ordered by frequency

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.occurrence:141 ==abydos.fingerprint.occurrencehalved:153 if nbits > 0: fingerprint <<= n_bits

return fingerprint

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:63 ==abydos.phonetic.refinedsoundex:72 def encodealpha(self, word: str) -> str: ```Return the alphabetic LEIN code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic LEIN code

Examples


pe = LEIN() pe.encodealpha('Christopher') 'CLKT' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SKNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the LEIN code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The LEIN code

 Examples
 --------
 >>> pe = LEIN()
 >>> pe.encode('Christopher')
 'C351'
 >>> pe.encode('Niall')
 'N300'
 >>> pe.encode('Smith')
 'S210'
 >>> pe.encode('Schmidt')
 'S521'


 .. versionadded:: 0.3.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

# uppercase, normalize, decompose, and filter non-A-Z out word = unicodenormalize('NFKD', word.upper()) word = ''.join(c for c in word if c in self.uc_set)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.pshpsoundexfirst:203 ==abydos.phonetic.pshpsoundexlast:239 code = code.replace('0', '') # rule 1

if self.maxlength != -1: if len(code) < self.maxlength: code += '0' * (self.maxlength - len(code)) else: code = code[: self.maxlength]

return code

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:393 ==abydos.distance.phoneticeditdistance:307 normalizeterm = self.normalizer( [srclen * delcost, tarlen * inscost] )

return self.distabs(src, tar) / normalizeterm

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:81 ==abydos.distance.manhattan:81 alphabet=alphabet, tokenizer=tokenizer, intersectiontype=intersectiontype, **kwargs )

def dist_abs(self, src: str, tar: str, normalized: bool = False) -> float: ```Return the Euclidean distance between two strings.

Parameters


src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison normalized : bool Normalizes to [0, 1] if True

Returns


float The Euclidean distance

Examples


cmp = Euclidean() cmp.distabs('cat', 'hat') 2.0 round(cmp.distabs('Niall', 'Neil'), 12) 2.645751311065 cmp.distabs('Colin', 'Cuilen') 3.0 round(cmp.distabs('ATCG', 'TAGC'), 12) 3.162277660168

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.gotoh:67 ==abydos.distance.needlemanwunsch:150 ==abydos.distance.smithwaterman:65 self.simfunc = cast( Callable[[str, str], float], NeedlemanWunsch.simmatrix if simfunc is None else simfunc, ) # type: Callable[[str, str], float]

def sim_score(self, src: str, tar: str) -> float: ```Return the Gotoh score of two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Gotoh score

Examples


cmp = Gotoh() cmp.simscore('cat', 'hat') 2.0 cmp.simscore('Niall', 'Neil') 1.0 round(cmp.simscore('aluminum', 'Catalan'), 12) -0.4 cmp.simscore('cat', 'hat') 2.0

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

d_mat = np_zeros((len(src) + 1, len(tar) + 1), dtype=np_float)

Similar lines in 5 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbwtrle:80 ==abydos.distance.ncdbz2:103 ==abydos.distance.ncdlzma:102 ==abydos.distance.ncdlzss:90 ==abydos.distance.ncdrle:80 return ( min(len(concatcomp), len(concatcomp2)) - min(len(srccomp), len(tarcomp)) ) / max(len(srccomp), len(tarcomp))

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:198 ==abydos.distance.strcmp95:200 numcom += 1 break

# If no characters in common - return if num_com == 0: return 0.0

# Count the number of transpositions k = n_trans = 0

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:145 ==abydos.fingerprint.occurrence:142 ==abydos.fingerprint.occurrencehalved:154 fingerprint <<= n_bits

return fingerprint

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.qgrams:167 ==abydos.tokenizer.qskipgrams:183 string = ( self.startstop[0] * (qvali - 1) + self.string + self.startstop[-1] * (qvali - 1) ) else: string = self.string

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:194 ==abydos.distance.smithwaterman:105 for i in range(1, len(src) + 1): for j in range(1, len(tar) + 1): match = dmat[i - 1, j - 1] + self.simfunc( src[i - 1], tar[j - 1] ) delete = dmat[i - 1, j] - self.gapcost insert = dmat[i, j - 1] - self.gap_cost

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:95 ==abydos.phonetic._phonic:45 'D': '1', 'T': '1', 'N': '2', 'M': '3', 'R': '4', 'L': '5', 'J': '6',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.daitchmokotoff:363 ==abydos.phonetic.soundex:206 word = ''.join(c for c in word if c in self.uc_set)

# Nothing to convert, return base case if not word: if self.zeropad: return '0' * self.maxlength return '0'

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:149 ==abydos.stemmer.porter:204 if word[0] == 'y': word = 'Y' + word[1:] for i in range(1, len(word)): if word[i] == 'y' and word[i - 1] in self._vowels: word = word[:i] + 'Y' + word[i + 1 :]

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.soundd:36 ==abydos.phonetic.soundexbr:36 trans = dict( zip( (ord() for _ in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), '01230120022455012623010202', ) )

alphabetic = dict(zip((ord() for _ in '0123456'), 'APKTLNR'))

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.util.init:34 ==abydos.util.data:37 _all__ = [ 'datapath', 'downloadpackage', 'listavailablepackages', 'listinstalledpackages', 'package_path', ]

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.clefgerman:84 ==abydos.stemmer.sstemmer:73 return word[:-1] return word

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.schinke:217 ==abydos.stemmer.snowball_norwegian:46 'l', 'm', 'n', 'o', 'p',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.rethschek:182 ==abydos.stemmer.snowballdanish:167 word = word[:-1]

return word

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.prefix:77 ==abydos.distance.suffix:77 return i / min_len return 0.0

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:218 ==abydos.stemmer.porter:241 word = word[:-3] step1b_flag = True

if step1b_flag: if word[-2:] in {'at', 'bl', 'iz'}: word += 'e'

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.haase:179 ==abydos.phonetic.koelner:129 word = unicode_normalize('NFKD', word.upper())

word = word.replace('Ä', 'AE') word = word.replace('Ö', 'OE') word = word.replace('Ü', 'UE') word = ''.join(c for c in word if c in self.ucset)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:239 ==abydos.distance.strcmp95:256 if ( self.longstrings and (minv > 4) and (numcom > i + 1) and (2 * num_com >= minv + i) ):

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.synoname:27 ==abydos.distance.token_distance:29 Optional, Tuple, Union, cast, )

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.flexmetric:31 ==abydos.distance.meta_levenshtein:30 cast, )

from numpy import float_ as npfloat from numpy import zeros as npzeros

from ._distance import _Distance

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.koelner:225 ==abydos.phonetic.russellindex:135 num = ''.join(c for c in self.encode(word) if c in self.numset) return num.translate(self.num_trans)

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.dameraulevenshtein:113 ==abydos.distance.shapirastoreri:152 if src == tar: return 0 if not src: return len(tar) * inscost if not tar: return len(src) * del_cost

Similar lines in 4 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.character:79 ==abydos.tokenizer.legalipy:139 ==abydos.tokenizer.regexp:99 ==abydos.tokenizer.sonoripy:104 self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:372 ==abydos.distance.phoneticeditdistance:300 if src == tar: return 0.0 inscost, delcost = self._cost[:2]

srclen = len(src) tarlen = len(tar)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:102 ==abydos.phonetic._phonic:54 'G': '7', 'Q': '7', 'X': '7', 'F': '8', 'V': '8', 'B': '9', 'P': '9',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.init:102 ==abydos.fingerprint.fingerprint:86 'MOSTCOMMONLETTERS', 'MOSTCOMMONLETTERSCG', 'MOSTCOMMONLETTERSDE', 'MOSTCOMMONLETTERSDELC', 'MOSTCOMMONLETTERSEN_LC',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:38 ==abydos.stemmer.snowballswedish:38 sendings = { 'b', 'c', 'd', 'f', 'g', 'h', 'j',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdutch:177 ==abydos.stemmer.snowballgerman:175 if len(word[r2start:]) >= 3: word = word[:-3] if ( word[-2:] == 'ig' and len(word[r2start:]) >= 2 and word[-3] != 'e' ): word = word[:-2]

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.schinke:215 ==abydos.stemmer.snowballdanish:46 ==abydos.stemmer.snowball_swedish:45 'j', 'k', 'l', 'm', 'n', 'o', 'p',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.editex:293 ==abydos.distance.levenshtein:388 for pos in range(tar_len) ), ] ) else:

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:39 ==abydos.phonetic.rogerroot:46 'GF': '08', 'GM': '03', 'GN': '02', 'KN': '02', 'PF': '08',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballdanish:40 ==abydos.stemmer.snowballnorwegian:39 'b', 'c', 'd', 'f', 'g', 'h', 'j',

Similar lines in 7 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:147 ==abydos.tokenizer.cvcluster:144 ==abydos.tokenizer.nltk:104 ==abydos.tokenizer.qgrams:189 ==abydos.tokenizer.qskipgrams:213 ==abydos.tokenizer.saps:114 ==abydos.tokenizer.vccluster:144 self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.inclusion:110 ==abydos.distance.mlipns:119 return 1.0 return 0.0

if name == 'main': import doctest

doctest.testmod()

Similar lines in 4 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.azzoo:24 ==abydos.distance.generalizedfleiss:27 ==abydos.distance.minkowski:24 ==abydos.distance.mutualinformation:25 Optional, Sequence, Set, Union, cast, )

from .tokendistance import _TokenDistance

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.phoneticeditdistance:29 ==abydos.distance.token_distance:30 Tuple, Union, cast, )

import numpy as np

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.sift4:138 ==abydos.distance.sift4simplest:109 for i in range(self.maxoffset): if not ( (srccur + i < srclen) or (tarcur + i < tar_len) ): break

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.henryearly:245 ==abydos.phonetic.pshpsoundexfirst:209 ==abydos.phonetic.pshpsoundexlast:245 code = code[: self.maxlength]

return code

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:46 ==abydos.stemmer.snowballswedish:47 'l', 'm', 'n', 'o', 'p', 'r', 't', 'v', 'y',

Similar lines in 4 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.averagelinkage:20 ==abydos.distance.completelinkage:21 ==abydos.distance.singlelinkage:21 ==abydos.distance.softcosine:21 from typing import Any, Optional, cast

from .distance import _Distance from .levenshtein import Levenshtein from .tokendistance import _TokenDistance from ..tokenizer import _Tokenizer

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.clefgermanplus:97 ==abydos.stemmer.snowballnorwegian:145 ==abydos.stemmer.snowball_swedish:145 word = word[:-2]

return word

if name == 'main': import doctest

doctest.testmod()

There are no issues that match your filters.

Category
Status