chrislit/abydos

View on GitHub

Showing 4,191 of 4,191 total issues

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.lcprefix:68 ==abydos.distance.lcsuffix:69 def dist_abs(self, src: str, tar: str, *args: str) -> int: ```Return the length of the longest common prefix of the strings.

Parameters


src : str Source string for comparison tar : str Target string for comparison *args : strs Additional strings for comparison

Raises


ValueError All arguments must be of type str

Returns


int The length of the longest common prefix

Examples


pfx = LCPrefix() pfx.distabs('cat', 'hat') 0 pfx.distabs('Niall', 'Neil') 1 pfx.distabs('aluminum', 'Catalan') 0 pfx.distabs('ATCG', 'TAGC') 0

.. versionadded:: 0.4.0

strings = [src, tar]
 for arg in args:
 if isinstance(arg, str):
 strings.append(arg)
 else:
 raise TypeError('All arguments must be of type str')

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbz2:56 ==abydos.distance.ncdlzma:55 ==abydos.distance.ncdzlib:54 super().init(**kwargs) self._level = level

def dist(self, src: str, tar: str) -> float: ```Return the NCD between two strings using LZMA compression.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float Compression distance

Examples


cmp = NCDlzma() cmp.dist('cat', 'hat') 0.08695652173913043 cmp.dist('Niall', 'Neil') 0.16 cmp.dist('aluminum', 'Catalan') 0.16 cmp.dist('ATCG', 'TAGC') 0.08695652173913043

.. versionadded:: 0.3.5 .. versionchanged:: 0.3.6 Encapsulated in class

if src == tar:
 return 0.0

 src_b = src.encode('utf-8')
 tar_b = tar.encode('utf-8')

Similar lines in 3 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:42 ==abydos.tokenizer.cvcluster:43 ==abydos.tokenizer.vccluster:43 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, consonants: Optional[Set[str]] = None, vowels: Optional[Set[str]] = None, ) -> None: ```Initialize tokenizer.

Parameters


scaler : None, str, or function A scaling function for the Counter:

  • None : no scaling
  • 'set' : All non-zero values are set to 1.
  • 'length' : Each token has weight equal to its length.
  • 'length-log' : Each token has weight equal to the log of its length + 1.
  • 'length-exp' : Each token has weight equal to e raised to its length.
  • a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. consonants : None or set(str) The set of characters to treat as consonants vowels : None or set(str) The set of characters to treat as vowels

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.whitespace:41 ==abydos.tokenizer.wordpunct:42 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, flags: int = 0, ) -> None: ```Initialize tokenizer.

Parameters


scaler : None, str, or function A scaling function for the Counter:

  • None : no scaling
  • 'set' : All non-zero values are set to 1.
  • 'length' : Each token has weight equal to its length.
  • 'length-log' : Each token has weight equal to the log of its length + 1.
  • 'length-exp' : Each token has weight equal to e raised to its length.
  • a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. flags : int Flags to pass to the regular expression matcher. See the documentation on Python's re module <https://docs.python.org/3/library/re.html#re.A>_ for details.

.. versionadded:: 0.4.0

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.phoneticeditdistance:107 ==abydos.phones.phones:956 if isinstance(weights, dict): weights = [ weights[feature] if feature in weights else 0 for feature in sorted( FEATUREMASK, key=FEATUREMASK.get, reverse=True ) ] elif isinstance(weights, (list, tuple)): weights = list(weights) + [0] * (len(FEATUREMASK) - len(weights))

Wrong hanging indentation before block (add 4 spaces).
Open

        self,
Severity: Info
Found in abydos/tokenizer/_wordpunct.py by pylint

TODO self, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        self, scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_sonoripy.py by pylint

TODO self, scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                    (
Severity: Info
Found in abydos/tokenizer/_saps.py by pylint

TODO ( ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        flags: int = 0,
Severity: Info
Found in abydos/tokenizer/_whitespace.py by pylint

TODO flags: int = 0, ^ |

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:285 ==abydos.distance.phoneticeditdistance:253 dmat = cast( np.ndarray, self.alignmentmatrix(src, tar, backtrace=False) )

if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])

def dist(self, src: str, tar: str) -> float: ```Return the normalized phonetic edit distance between two strings.

The edit distance is normalized by dividing the edit distance (calculated by either of the two supported methods) by the greater of the number of characters in src times the cost of a delete and the number of characters in tar times the cost of an insert. For the case in which all operations have :math:cost = 1, this is equivalent to the greater of the length of the two strings src & tar.

Parameters


src : str Source string for comparison tar : str Target string for comparison

Returns


float The normalized Levenshtein distance between src & tar

Examples


cmp = PhoneticEditDistance() round(cmp.dist('cat', 'hat'), 12) 0.059139784946 round(cmp.dist('Niall', 'Neil'), 12) 0.232258064516 cmp.dist('aluminum', 'Catalan') 0.3084677419354839 cmp.dist('ATCG', 'TAGC') 0.2983870967741935

.. versionadded:: 0.4.1

if src == tar:

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:138 ==abydos.distance.dameraulevenshtein:232 if src == tar: return 0.0 inscost, delcost = self.cost[:2] return self.distabs(src, tar) / ( self.normalizer([len(src) * delcost, len(tar) * ins_cost]) )

if name == 'main': import doctest

doctest.testmod()

Similar lines in 2 files
Open

# Copyright 2014-2020 by Christopher C. Little.
Severity: Info
Found in abydos/compression/_rle.py by pylint

Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.phoneticeditdistance:181 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))

if self._mode == 'osa': if ( i + 1 > 1 and j + 1 > 1

Wrong hanging indentation before block (add 4 spaces).
Open

        vowels: Optional[Set[str]] = None,
Severity: Info
Found in abydos/tokenizer/_cv_cluster.py by pylint

TODO vowels: Optional[Set[str]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        scaler: Optional[Union[str, Callable[[float], float]]] = None,
Severity: Info
Found in abydos/tokenizer/_wordpunct.py by pylint

TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        vowels: Optional[Set[str]] = None,
Severity: Info
Found in abydos/tokenizer/_vc_cluster.py by pylint

TODO vowels: Optional[Set[str]] = None, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

                and token[0] not in self._vowels
Severity: Info
Found in abydos/tokenizer/_cv_cluster.py by pylint

TODO and token[0] not in self._vowels ^ |

Unused variable 'count'
Open

            count, term_doc_count = self.corpus[term]
Severity: Minor
Found in abydos/corpus/_unigram_corpus.py by pylint

Used when a variable is defined but not used.

Wrong hanging indentation before block (add 4 spaces).
Open

        flags: int = 0,
Severity: Info
Found in abydos/tokenizer/_wordpunct.py by pylint

TODO flags: int = 0, ^ |

Wrong hanging indentation before block (add 4 spaces).
Open

        corpus: Corpus,
Severity: Info
Found in abydos/corpus/_ngram_corpus.py by pylint

TODO corpus: Corpus, ^ |

Unused argument 'args'
Open

        self,
Severity: Minor
Found in abydos/tokenizer/_tokenizer.py by pylint

Used when a function or method argument is not used.

Severity
Category
Status
Source
Language