chrislit/abydos

View on GitHub

Showing 4,191 of 4,191 total issues

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:393 ==abydos.distance.phoneticeditdistance:307 normalizeterm = self.normalizer( [srclen * delcost, tarlen * inscost] )

return self.distabs(src, tar) / normalizeterm

if name == 'main': import doctest

doctest.testmod()

Wrong hanging indentation before block (add 4 spaces).
Open

        scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_cv_cluster.py by pylint

TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        consonants: Optional[Set[str]] = None,
Severity: Info
Found in abydos/tokenizer/_vc_cluster.py by pylint

TODO consonants: Optional[Set[str]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        self, scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_saps.py by pylint

TODO self, scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                    or (len(w[i:]) == 1 and w[i : i + 1] not in _vowels)
Severity: Info
Found in abydos/tokenizer/_saps.py by pylint

TODO or (len(w[i:]) == 1 and w[i : i + 1] not in _vowels) ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                and ipa[pos : pos + i] in _PHONETIC_FEATURES
Severity: Info
Found in abydos/phones/_phones.py by pylint

TODO and ipa[pos : pos + i] in PHONETICFEATURES ^ |

Consider iterating the dictionary directly instead of calling .keys()
Open

                for feature in _FEATURE_MASK.keys():
Severity: Info
Found in abydos/phones/_phones.py by pylint

Emitted when the keys of a dictionary are iterated through the .keys() method. It is enough to just iterate through the dictionary itself, as in for key in dictionary.

Wrong hanging indentation before block (add 4 spaces).
Open

        word_tokenizer: Optional[_Tokenizer] = None,
Severity: Info
Found in abydos/corpus/_corpus.py by pylint

TODO wordtokenizer: Optional[Tokenizer] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        n_val: int = 1,
Severity: Info
Found in abydos/corpus/_ngram_corpus.py by pylint

TODO n_val: int = 1, ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:81 ==abydos.distance.manhattan:81 alphabet=alphabet, tokenizer=tokenizer, intersectiontype=intersectiontype, **kwargs )

def dist_abs(self, src: str, tar: str, normalized: bool = False) -> float: ```Return the Euclidean distance between two strings.

Parameters


src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison normalized : bool Normalizes to [0, 1] if True

Returns


float The Euclidean distance

Examples


cmp = Euclidean() cmp.distabs('cat', 'hat') 2.0 round(cmp.distabs('Niall', 'Neil'), 12) 2.645751311065 cmp.distabs('Colin', 'Cuilen') 3.0 round(cmp.distabs('ATCG', 'TAGC'), 12) 3.162277660168

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.gotoh:67 ==abydos.distance.needlemanwunsch:150 ==abydos.distance.smithwaterman:65 self.simfunc = cast( Callable[[str, str], float], NeedlemanWunsch.simmatrix if simfunc is None else simfunc, ) # type: Callable[[str, str], float]

def sim_score(self, src: str, tar: str) -> float: ```Return the Gotoh score of two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Gotoh score

Examples


cmp = Gotoh() cmp.simscore('cat', 'hat') 2.0 cmp.simscore('Niall', 'Neil') 1.0 round(cmp.simscore('aluminum', 'Catalan'), 12) -0.4 cmp.simscore('cat', 'hat') 2.0

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

d_mat = np_zeros((len(src) + 1, len(tar) + 1), dtype=np_float)

Similar lines in 5 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbwtrle:80 ==abydos.distance.ncdbz2:103 ==abydos.distance.ncdlzma:102 ==abydos.distance.ncdlzss:90 ==abydos.distance.ncdrle:80 return ( min(len(concatcomp), len(concatcomp2)) - min(len(srccomp), len(tarcomp)) ) / max(len(srccomp), len(tarcomp))

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.jarowinkler:198 ==abydos.distance.strcmp95:200 numcom += 1 break

# If no characters in common - return if num_com == 0: return 0.0

# Count the number of transpositions k = n_trans = 0

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:145 ==abydos.fingerprint.occurrence:142 ==abydos.fingerprint.occurrencehalved:154 fingerprint <<= n_bits

return fingerprint

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.qgrams:167 ==abydos.tokenizer.qskipgrams:183 string = ( self.startstop[0] * (qvali - 1) + self.string + self.startstop[-1] * (qvali - 1) ) else: string = self.string

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.needlemanwunsch:194 ==abydos.distance.smithwaterman:105 for i in range(1, len(src) + 1): for j in range(1, len(tar) + 1): match = dmat[i - 1, j - 1] + self.simfunc( src[i - 1], tar[j - 1] ) delete = dmat[i - 1, j] - self.gapcost insert = dmat[i, j - 1] - self.gap_cost

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.alphasis:95 ==abydos.phonetic._phonic:45 'D': '1', 'T': '1', 'N': '2', 'M': '3', 'R': '4', 'L': '5', 'J': '6',

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.daitchmokotoff:363 ==abydos.phonetic.soundex:206 word = ''.join(c for c in word if c in self.uc_set)

# Nothing to convert, return base case if not word: if self.zeropad: return '0' * self.maxlength return '0'

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.porter2:149 ==abydos.stemmer.porter:204 if word[0] == 'y': word = 'Y' + word[1:] for i in range(1, len(word)): if word[i] == 'y' and word[i - 1] in self._vowels: word = word[:i] + 'Y' + word[i + 1 :]

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.soundd:36 ==abydos.phonetic.soundexbr:36 trans = dict( zip( (ord() for _ in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), '01230120022455012623010202', ) )

alphabetic = dict(zip((ord() for _ in '0123456'), 'APKTLNR'))

Severity
Category
Status
Source
Language