chrislit/abydos

View on GitHub

Showing 4,191 of 4,191 total issues

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.whitespace:41 ==abydos.tokenizer.wordpunct:42 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, flags: int = 0, ) -> None: ```Initialize tokenizer.

Parameters


scaler : None, str, or function A scaling function for the Counter:

  • None : no scaling
  • 'set' : All non-zero values are set to 1.
  • 'length' : Each token has weight equal to its length.
  • 'length-log' : Each token has weight equal to the log of its length + 1.
  • 'length-exp' : Each token has weight equal to e raised to its length.
  • a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. flags : int Flags to pass to the regular expression matcher. See the documentation on Python's re module <https://docs.python.org/3/library/re.html#re.A>_ for details.

.. versionadded:: 0.4.0

Wrong hanging indentation before block (add 4 spaces).
Open

        self,
Severity: Info
Found in abydos/tokenizer/_vc_cluster.py by pylint

TODO self, ^ |

Unnecessary else after raise
Open

            if self._terminator not in code:
Severity: Info
Found in abydos/compression/_bwt.py by pylint

Used in order to highlight an unnecessary block of code following an if containing a raise statement. As such, it will warn when it encounters an else following a chain of ifs, all of them containing a raise statement.

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:38 ==abydos.distance.manhattan:38 def init( self, alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersectiontype: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.

Parameters


alphabet : collection or int The values or size of the alphabet tokenizer : Tokenizer A tokenizer instance from the :py:mod:abydos.tokenizer package intersectiontype : str Specifies the intersection type, and set type as a result: See :ref:intersection_type <intersection_type> description in :py:class:_TokenDistance for details. **kwargs Arbitrary keyword arguments

Other Parameters


qval : int The length of each q-gram. Using this parameter and tokenizer=None will cause the instance to use the QGram tokenizer with this q value. metric : _Distance A string distance measure class for use in the soft and fuzzy variants. threshold : float A threshold value, similarities above which are counted as members of the intersection for the fuzzy variant.

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:111 ==abydos.distance.phoneticeditdistance:118 def alignmentmatrix( self, src: str, tar: str, backtrace: bool = True ) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]: ```Return the Levenshtein alignment matrix.

Parameters


src : str Source string for comparison tar : str Target string for comparison backtrace : bool Return the backtrace matrix as well

Returns


numpy.ndarray or tuple(numpy.ndarray, numpy.ndarray) The alignment matrix and (optionally) the backtrace matrix

.. versionadded:: 0.4.1

ins_cost, del_cost, sub_cost, trans_cost = self._cost

 src_len = len(src)
 tar_len = len(tar)

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self.consonants: if mode == 2: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 1 elif char in self._vowels:

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:138 ==abydos.distance.dameraulevenshtein:232 if src == tar: return 0.0 inscost, delcost = self.cost[:2] return self.distabs(src, tar) / ( self.normalizer([len(src) * delcost, len(tar) * ins_cost]) )

if name == 'main': import doctest

doctest.testmod()

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:193 ==abydos.phonetic.soundex:231 ==abydos.phonetic.soundex_br:155 sdx = sdx.replace('0', '') # rule 1

if self.zeropad: sdx += '0' * self.maxlength # rule 4

return sdx[: self.maxlength]

if name == 'main': import doctest

doctest.testmod()

Wrong hanging indentation before block (add 4 spaces).
Open

        scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_vc_cluster.py by pylint

TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                token[0] not in self._consonants
Severity: Info
Found in abydos/tokenizer/_vc_cluster.py by pylint

TODO token[0] not in self._consonants ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                pos + i - 1 <= len(ipa)
Severity: Info
Found in abydos/phones/_phones.py by pylint

TODO pos + i - 1 <= len(ipa) ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        self,
Severity: Info
Found in abydos/corpus/_unigram_corpus.py by pylint

TODO self, ^ |

Cyclic import (abydos.distance -> abydos.distance._ozbay)
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Used when a cyclic import between two or more modules is detected.

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:59 ==abydos.fingerprint.occurrence:58 ==abydos.fingerprint._position:60 def fingerprint(self, word: str) -> str: ```Return the position fingerprint.

Parameters


word : str The word to fingerprint

Returns


str The position fingerprint

Examples


pf = Position() pf.fingerprint('hat') '1110100011111111' pf.fingerprint('niall') '1111110101110010' pf.fingerprint('colin') '1111111110010111' pf.fingerprint('atcg') '1110010001111111' pf.fingerprint('entreatment') '0000101011111111'

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class .. versionchanged:: 0.6.0 Changed to return a str and added fingerprint_int method

return ('{:0' + str(self._n_bits) + 'b}').format(
 self.fingerprint_int(word)
 )

 def fingerprint_int(self, word: str) -> int:
 ```Return the position fingerprint.

 Parameters
 ----------
 word : str
 The word to fingerprint

 Returns
 -------
 int
 The position fingerprint as an int

 Examples
 --------
 >>> pf = Position()
 >>> pf.fingerprint_int('hat')
 59647
 >>> pf.fingerprint_int('niall')
 64882
 >>> pf.fingerprint_int('colin')
 65431
 >>> pf.fingerprint_int('atcg')
 58495
 >>> pf.fingerprint_int('entreatment')
 2815


 .. versionadded:: 0.6.0

nbits = self.n_bits

Similar lines in 4 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:66 ==abydos.phonetic.lein:61 ==abydos.phonetic.phonex:58 ==abydos.phonetic.phonix:190 self.zeropad = zeropad

def encode_alpha(self, word: str) -> str: ```Return the alphabetic Phonex code for a word.

Parameters


word : str The word to transform

Returns


str The alphabetic Phonex value

Examples


pe = Phonex() pe.encodealpha('Christopher') 'CRST' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SSNT'

.. versionadded:: 0.4.0

code = self.encode(word).rstrip('0')
 return code[:1] + code[1:].translate(self._alphabetic)

 def encode(self, word: str) -> str:
 ```Return the Phonex code for a word.

 Parameters
 ----------
 word : str
 The word to transform

 Returns
 -------
 str
 The Phonex value

 Examples
 --------
 >>> pe = Phonex()
 >>> pe.encode('Christopher')
 'C623'
 >>> pe.encode('Niall')
 'N400'
 >>> pe.encode('Schmidt')
 'S253'
 >>> pe.encode('Smith')
 'S530'


 .. versionadded:: 0.1.0
 .. versionchanged:: 0.3.6
 Encapsulated in class

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.sift4:59 ==abydos.distance.sift4simplest:54 def distabs(self, src: str, tar: str) -> float: ``Return thecommon` Sift4 distance between two terms.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


int The Sift4 distance according to the common formula

Examples


cmp = Sift4() cmp.distabs('cat', 'hat') 1 cmp.distabs('Niall', 'Neil') 2 cmp.distabs('Colin', 'Cuilen') 3 cmp.distabs('ATCG', 'TAGC') 2

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

if not src:
 return len(tar)

 if not tar:
 return len(src)

 src_len = len(src)
 tar_len = len(tar)

 src_cur = 0
 tar_cur = 0
 lcss = 0
 local_cs = 0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:125 ==abydos.distance.minkowski:163 def dist(self, src: str, tar: str) -> float: ```Return the normalized Euclidean distance between two strings.

The normalized Euclidean distance is a distance metric in :math:L^2-space, normalized to [0, 1].

Parameters


src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison

Returns


float The normalized Euclidean distance

Examples


cmp = Euclidean() round(cmp.dist('cat', 'hat'), 12) 0.57735026919 round(cmp.dist('Niall', 'Neil'), 12) 0.683130051064 round(cmp.dist('Colin', 'Cuilen'), 12) 0.727606875109 cmp.dist('ATCG', 'TAGC') 1.0

.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class

return self.dist_abs(src, tar, normalized=True)


if __name__ == '__main__':
 import doctest

 doctest.testmod()

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:40 ==abydos.distance.manhattan:40 ==abydos.distance.minkowski:49 alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersection_type: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.

Parameters


alphabet : collection or int The values or size of the alphabet tokenizer : Tokenizer A tokenizer instance from the :py:mod:abydos.tokenizer package intersectiontype : str Specifies the intersection type, and set type as a result: See :ref:intersection_type <intersection_type> description in :py:class:_TokenDistance for details. **kwargs Arbitrary keyword arguments

Other Parameters


qval : int The length of each q-gram. Using this parameter and tokenizer=None will cause the instance to use the QGram tokenizer with this q value. metric : _Distance A string distance measure class for use in the soft and fuzzy variants. threshold : float A threshold value, similarities above which are counted as members of the intersection for the fuzzy variant.

.. versionadded:: 0.4.0

Wrong hanging indentation before block (add 4 spaces).
Open

        scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_nltk.py by pylint

TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

            hasattr(nltk_tokenizer, 'tokenize')
Severity: Info
Found in abydos/tokenizer/_nltk.py by pylint

TODO hasattr(nltk_tokenizer, 'tokenize') ^ |

Severity
Category
Status
Source
Language