chrislit/abydos

View on GitHub

Showing 4,191 of 4,191 total issues

Unnecessary else after raise
Open

            if self._terminator not in code:
Severity: Info
Found in abydos/compression/_bwt.py by pylint

Used in order to highlight an unnecessary block of code following an if containing a raise statement. As such, it will warn when it encounters an else following a chain of ifs, all of them containing a raise statement.

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:62 ==abydos.phonetic.phonex:54 if maxlength != -1: self.maxlength = min(max(4, maxlength), 64) else: self.maxlength = 64 self.zeropad = zeropad

def encode_alpha(self, word: str) -> str: ```Return the alphabetic Fuzzy Soundex code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic Fuzzy Soundex value

Examples


pe = FuzzySoundex() pe.encodealpha('Christopher') 'KRSTP' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the Fuzzy Soundex code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The Fuzzy Soundex value

 Examples
 --------
 >>> pe = FuzzySoundex()
 >>> pe.encode('Christopher')
 'K6931'
 >>> pe.encode('Niall')
 'N4000'
 >>> pe.encode('Smith')
 'S5300'
 >>> pe.encode('Smith')
 'S5300'


 .. versionadded:: 0.1.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.chebyshev:21 ==abydos.distance.unknown_f:22 from typing import ( Any, Counter as TCounter, NoReturn, Optional, Sequence, Set, Union, )

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:354 ==abydos.distance.phoneticedit_distance:309 )

return self.distabs(src, tar) / normalizeterm

if name == 'main': import doctest

doctest.testmod()

Useless super delegation in method '__init__'
Open

    def __init__(
Severity: Minor
Found in abydos/tokenizer/_character.py by pylint

Used whenever we can detect that an overridden method is useless, relying on super() delegation to do the same thing as another method from the MRO.

Variable name n doesn't conform to snake_case naming style
Open

            n = len(self._ordered_tokens)
Severity: Info
Found in abydos/tokenizer/_tokenizer.py by pylint

Used when the name doesn't conform to naming rules associated to its type (constant, variable, class...).

Wrong hanging indentation before block (add 4 spaces).
Open

        scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_nltk.py by pylint

TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Similar lines in 4 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:66 ==abydos.phonetic.lein:61 ==abydos.phonetic.phonex:58 ==abydos.phonetic.phonix:190 self.zeropad = zeropad

def encode_alpha(self, word: str) -> str: ```Return the alphabetic Phonex code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic Phonex value

Examples


pe = Phonex() pe.encodealpha('Christopher') 'CRST' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SSNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the Phonex code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The Phonex value

 Examples
 --------
 >>> pe = Phonex()
 >>> pe.encode('Christopher')
 'C623'
 >>> pe.encode('Niall')
 'N400'
 >>> pe.encode('Schmidt')
 'S253'
 >>> pe.encode('Smith')
 'S530'


 .. versionadded:: 0.1.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:283 ==abydos.distance._levenshtein:321 )

dmat = cast( np.ndarray, self.alignment_matrix(src, tar, backtrace=False) )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized Levenshtein distance between two strings.

The Levenshtein distance is normalized by dividing the Levenshtein distance (calculated by any of the three supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = DiscountedLevenshtein() cmp.dist('cat', 'hat') 0.3513958291799864 cmp.dist('Niall', 'Neil') 0.5909885886270658 cmp.dist('aluminum', 'Catalan') 0.8348163322045603 cmp.dist('ATCG', 'TAGC') 0.7217609721523955

.. versionadded:: 0.4.1

Wrong hanging indentation before block (add 4 spaces).
Open

                token[0] not in self._consonants
Severity: Info
Found in abydos/tokenizer/_vc_cluster.py by pylint

TODO token[0] not in self._consonants ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:219 ==abydos.distance.phoneticeditdistance:197 ) if backtrace: tracemat[i + 1, j + 1] = 2 if backtrace: return dmat, tracemat return d_mat

def dist_abs(self, src: str, tar: str) -> float: ```Return the phonetic edit distance between two strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


int (may return a float if cost has float values) The phonetic edit distance between src & tar

Examples


cmp = PhoneticEditDistance() cmp.distabs('cat', 'hat') 0.17741935483870974 cmp.distabs('Niall', 'Neil') 1.161290322580645 cmp.distabs('aluminum', 'Catalan') 2.467741935483871 cmp.distabs('ATCG', 'TAGC') 1.193548387096774

cmp = PhoneticEditDistance(mode='osa') cmp.distabs('ATCG', 'TAGC') 0.46236225806451603 cmp.distabs('ACTG', 'TAGC') 1.2580645161290323

.. versionadded:: 0.4.1

Too many boolean expressions in if statement (6/5)
Open

                if syll[-1] in _vowels and (
Severity: Info
Found in abydos/tokenizer/_saps.py by pylint

Used when an if statement contains too many boolean expressions.

Wrong hanging indentation before block (add 4 spaces).
Open

        stop_words: Optional[Union[List[str], Set[str], Tuple[str]]] = None,
Severity: Info
Found in abydos/corpus/_corpus.py by pylint

TODO stop_words: Optional[Union[List[str], Set[str], Tuple[str]]] = None, ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:38 ==abydos.distance.manhattan:38 def init( self, alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersectiontype: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.

Parameters


alphabet : collection or int The values or size of the alphabet tokenizer : Tokenizer A tokenizer instance from the :py:mod:abydos.tokenizer package intersectiontype : str Specifies the intersection type, and set type as a result: See :ref:intersection_type <intersection_type> description in :py:class:_TokenDistance for details. **kwargs Arbitrary keyword arguments

Other Parameters


qval : int The length of each q-gram. Using this parameter and tokenizer=None will cause the instance to use the QGram tokenizer with this q value. metric : _Distance A string distance measure class for use in the soft and fuzzy variants. threshold : float A threshold value, similarities above which are counted as members of the intersection for the fuzzy variant.

.. versionadded:: 0.4.0

Too many arguments (7/5)
Open

    def __init__(
Severity: Info
Found in abydos/corpus/_corpus.py by pylint

Used when a function or method takes too many arguments.

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 ==abydos.tokenizer.vccluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self._consonants:

Wrong hanging indentation before block (add 4 spaces).
Open

        ngram: Union[str, List[str]],
Severity: Info
Found in abydos/corpus/_ngram_corpus.py by pylint

TODO ngram: Union[str, List[str]], ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.lein:135 ==abydos.phonetic.rogerroot:241 if self.zeropad: code += '0' * self.max_length # Rule 4

return code[: self.maxlength]

if name == 'main': import doctest

doctest.testmod()

Cyclic import (abydos.distance -> abydos.distance._ozbay)
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a cyclic import between two or more modules is detected.

Cyclic import (abydos.distance -> abydos.distance._rouge_su)
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a cyclic import between two or more modules is detected.

Severity
Category
Status
Source
Language