chrislit/abydos

View on GitHub

Showing 4,191 of 4,191 total issues

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.cvcluster:133 ==abydos.tokenizer.vccluster:133 newtoken += char mode = 2 else: # This should cover combining marks, marks, etc. newtoken += char

self.orderedtokens.append(new_token)

self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)

Wrong hanging indentation before block (add 4 spaces).
Open

        scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_whitespace.py by pylint

TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:137 ==abydos.tokenizer.cvcluster:134 mode = 2 else: # This should cover combining marks, marks, etc. new_token += char

self.orderedtokens.append(new_token)

self.orderedtokens = [ unicodedata.normalize('NFC', token) for token in self.orderedtokens ] self.scaleand_counterize() return self

if name == 'main': import doctest

doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)

Wrong hanging indentation before block (add 4 spaces).
Open

    feat2: int,
Severity: Info
Found in abydos/phones/_phones.py by pylint

TODO feat2: int, ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.levenshtein:165 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))

if self.mode == 'osa': if ( i + 1 > 1 and j + 1 > 1 and src[i] == tar[j - 1] and src[i - 1] == tar[j] ): # transposition dmat[i + 1, j + 1] = min(

String statement has no effect
Open

    """
Severity: Minor
Found in abydos/phones/_phones.py by pylint

Used when a string is used as a statement (which of course has no effect). This is a particular case of W0104 with its own message so you can easily disable it if you're using those strings as documentation, instead of comments.

Variable name n doesn't conform to snake_case naming style
Open

                n = tokens[tok] * count
Severity: Info
Found in abydos/corpus/_unigram_corpus.py by pylint

Used when the name doesn't conform to naming rules associated to its type (constant, variable, class...).

Method could be a function
Open

    def decode(self, text: str) -> str:
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a method doesn't use its bound instance, and so could be written as a function.

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.prefix:32 ==abydos.distance.suffix:32 def sim(self, src: str, tar: str) -> float: ```Return the suffix similarity of two strings.

Suffix similarity is the ratio of the length of the shorter term that exactly matches the longer term to the length of the shorter term, beginning at the end of both terms.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Suffix similarity

Examples


cmp = Suffix() cmp.sim('cat', 'hat') 0.6666666666666666 cmp.sim('Niall', 'Neil') 0.25 cmp.sim('aluminum', 'Catalan') 0.0 cmp.sim('ATCG', 'TAGC') 0.0

.. versionadded:: 0.1.0 .. versionchanged:: 0.3.6 Encapsulated in class

if src == tar:
 return 1.0
 if not src or not tar:
 return 0.0
 min_word, max_word = (src, tar) if len(src) < len(tar) else (tar, src)
 min_len = len(min_word)
 for i in range(min_len, 0, -1):

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:125 ==abydos.distance.minkowski:163 def dist(self, src: str, tar: str) -> float: ```Return the normalized Euclidean distance between two strings.

The normalized Euclidean distance is a distance metric in :math:L^2-space, normalized to [0, 1].

Parameters


src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison

Returns


float The normalized Euclidean distance

Examples


cmp = Euclidean() round(cmp.dist('cat', 'hat'), 12) 0.57735026919 round(cmp.dist('Niall', 'Neil'), 12) 0.683130051064 round(cmp.dist('Colin', 'Cuilen'), 12) 0.727606875109 cmp.dist('ATCG', 'TAGC') 1.0

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

return self.dist_abs(src, tar, normalized=True)


if __name__ == '__main__':
 import doctest

 doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self.consonants: if mode == 2: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 1 elif char in self._vowels:

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.azzoo:21 ==abydos.distance.minkowski:21 ==abydos.distance.mutualinformation:22 from typing import ( Any, Counter as TCounter, Optional, Sequence, Set, Union, cast, )

from .tokendistance import _TokenDistance from ..tokenizer import _Tokenizer

Wrong hanging indentation before block (add 4 spaces).
Open

        **kwargs: Any
Severity: Info
Found in abydos/tokenizer/_tokenizer.py by pylint

TODO **kwargs: Any ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        self,
Severity: Info
Found in abydos/corpus/_corpus.py by pylint

TODO self, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        word_tokenizer: Optional[_Tokenizer] = None,
Severity: Info
Found in abydos/corpus/_unigram_corpus.py by pylint

TODO wordtokenizer: Optional[Tokenizer] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

            counts.items(), key=lambda x: (x[1], x[0]), reverse=True
Severity: Info
Found in abydos/compression/_arithmetic.py by pylint

TODO counts.items(), key=lambda x: (x[1], x[0]), reverse=True ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.stemmer.snowballnorwegian:143 ==abydos.stemmer.snowballswedish:143 word = word[:-3] elif _r1[-2:] == 'ig': word = word[:-2]

return word

if name == 'main': import doctest

doctest.testmod()

Wrong hanging indentation before block (add 4 spaces).
Open

        self,
Severity: Info
Found in abydos/tokenizer/_cv_cluster.py by pylint

TODO self, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                token[0] not in self._consonants
Severity: Info
Found in abydos/tokenizer/_cv_cluster.py by pylint

TODO token[0] not in self._consonants ^ |

Keyword argument before variable positional arguments list in the definition of __init__ function
Open

    def __init__(
Severity: Minor
Found in abydos/tokenizer/_tokenizer.py by pylint

When defining a keyword argument before variable positional arguments, one can end up in having multiple values passed for the aforementioned parameter in case the method is called with keyword arguments.

Severity
Category
Status
Source
Language