Showing 4,191 of 4,191 total issues
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.whitespace:41 ==abydos.tokenizer.wordpunct:42 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, flags: int = 0, ) -> None: ```Initialize tokenizer.
Parameters
scaler : None, str, or function A scaling function for the Counter:
- None : no scaling
- 'set' : All non-zero values are set to 1.
- 'length' : Each token has weight equal to its length.
- 'length-log' : Each token has weight equal to the log of its length + 1.
- 'length-exp' : Each token has weight equal to e raised to its length.
- a callable function : The function is applied to each value
in the Counter. Some useful functions include math.exp,
math.log1p, math.sqrt, and indexes into interesting integer
sequences such as the Fibonacci sequence.
flags : int
Flags to pass to the regular expression matcher. See the
documentation on Python's re module <https://docs.python.org/3/library/re.html#re.A>
_ for details.
.. versionadded:: 0.4.0
Wrong hanging indentation before block (add 4 spaces). Open
self,
- Read upRead up
- Exclude checks
TODO self, ^ |
Unnecessary else
after raise
Open
if self._terminator not in code:
- Read upRead up
- Exclude checks
Used in order to highlight an unnecessary block of code following an if containing a raise statement. As such, it will warn when it encounters an else following a chain of ifs, all of them containing a raise statement.
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:38 ==abydos.distance.manhattan:38 def init( self, alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersectiontype: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.
Parameters
alphabet : collection or int
The values or size of the alphabet
tokenizer : Tokenizer
A tokenizer instance from the :py:mod:abydos.tokenizer
package
intersectiontype : str
Specifies the intersection type, and set type as a result:
See :ref:intersection_type <intersection_type>
description in
:py:class:_TokenDistance
for details.
**kwargs
Arbitrary keyword arguments
Other Parameters
qval : int
The length of each q-gram. Using this parameter and tokenizer=None
will cause the instance to use the QGram tokenizer with this
q value.
metric : _Distance
A string distance measure class for use in the soft
and
fuzzy
variants.
threshold : float
A threshold value, similarities above which are counted as
members of the intersection for the fuzzy
variant.
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.levenshtein:111 ==abydos.distance.phoneticeditdistance:118 def alignmentmatrix( self, src: str, tar: str, backtrace: bool = True ) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]: ```Return the Levenshtein alignment matrix.
Parameters
src : str Source string for comparison tar : str Target string for comparison backtrace : bool Return the backtrace matrix as well
Returns
numpy.ndarray or tuple(numpy.ndarray, numpy.ndarray) The alignment matrix and (optionally) the backtrace matrix
.. versionadded:: 0.4.1
ins_cost, del_cost, sub_cost, trans_cost = self._cost
src_len = len(src)
tar_len = len(tar)
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:110 ==abydos.tokenizer.cvcluster:111 self.string = string self.orderedtokens = [] tokenlist = self.regexp.findall(self.string) for token in tokenlist: if ( token[0] not in self.consonants and token[0] not in self.vowels ): self.orderedtokens.append(token) else: token = unicodedata.normalize('NFD', token) mode = 0 # 0 = starting mode, 1 = cons, 2 = vowels newtoken = '' # noqa: S105 for char in token: if char in self.consonants: if mode == 2: self.orderedtokens.append(newtoken) newtoken = char else: newtoken += char mode = 1 elif char in self._vowels:
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:138 ==abydos.distance.dameraulevenshtein:232 if src == tar: return 0.0 inscost, delcost = self.cost[:2] return self.distabs(src, tar) / ( self.normalizer([len(src) * delcost, len(tar) * ins_cost]) )
if name == 'main': import doctest
doctest.testmod()
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:193 ==abydos.phonetic.soundex:231 ==abydos.phonetic.soundex_br:155 sdx = sdx.replace('0', '') # rule 1
if self.zeropad: sdx += '0' * self.maxlength # rule 4
return sdx[: self.maxlength]
if name == 'main': import doctest
doctest.testmod()
Wrong hanging indentation before block (add 4 spaces). Open
scaler: Optional[Union[str, Callable[[float], float]]] = None,
- Read upRead up
- Exclude checks
TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
token[0] not in self._consonants
- Read upRead up
- Exclude checks
TODO token[0] not in self._consonants ^ |
Wrong hanging indentation before block (add 4 spaces). Open
pos + i - 1 <= len(ipa)
- Read upRead up
- Exclude checks
TODO pos + i - 1 <= len(ipa) ^ |
Wrong hanging indentation before block (add 4 spaces). Open
self,
- Read upRead up
- Exclude checks
TODO self, ^ |
Cyclic import (abydos.distance -> abydos.distance._ozbay) Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Used when a cyclic import between two or more modules is detected.
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.fingerprint.count:59 ==abydos.fingerprint.occurrence:58 ==abydos.fingerprint._position:60 def fingerprint(self, word: str) -> str: ```Return the position fingerprint.
Parameters
word : str The word to fingerprint
Returns
str The position fingerprint
Examples
pf = Position() pf.fingerprint('hat') '1110100011111111' pf.fingerprint('niall') '1111110101110010' pf.fingerprint('colin') '1111111110010111' pf.fingerprint('atcg') '1110010001111111' pf.fingerprint('entreatment') '0000101011111111'
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class .. versionchanged:: 0.6.0 Changed to return a str and added fingerprint_int method
return ('{:0' + str(self._n_bits) + 'b}').format(
self.fingerprint_int(word)
)
def fingerprint_int(self, word: str) -> int:
```Return the position fingerprint.
Parameters
----------
word : str
The word to fingerprint
Returns
-------
int
The position fingerprint as an int
Examples
--------
>>> pf = Position()
>>> pf.fingerprint_int('hat')
59647
>>> pf.fingerprint_int('niall')
64882
>>> pf.fingerprint_int('colin')
65431
>>> pf.fingerprint_int('atcg')
58495
>>> pf.fingerprint_int('entreatment')
2815
.. versionadded:: 0.6.0
nbits = self.n_bits
Similar lines in 4 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.phonetic.fuzzysoundex:66 ==abydos.phonetic.lein:61 ==abydos.phonetic.phonex:58 ==abydos.phonetic.phonix:190 self.zeropad = zeropad
def encode_alpha(self, word: str) -> str: ```Return the alphabetic Phonex code for a word.
Parameters
word : str The word to transform
Returns
str The alphabetic Phonex value
Examples
pe = Phonex() pe.encodealpha('Christopher') 'CRST' pe.encodealpha('Niall') 'NL' pe.encodealpha('Smith') 'SNT' pe.encodealpha('Schmidt') 'SSNT'
.. versionadded:: 0.4.0
code = self.encode(word).rstrip('0')
return code[:1] + code[1:].translate(self._alphabetic)
def encode(self, word: str) -> str:
```Return the Phonex code for a word.
Parameters
----------
word : str
The word to transform
Returns
-------
str
The Phonex value
Examples
--------
>>> pe = Phonex()
>>> pe.encode('Christopher')
'C623'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Schmidt')
'S253'
>>> pe.encode('Smith')
'S530'
.. versionadded:: 0.1.0
.. versionchanged:: 0.3.6
Encapsulated in class
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication.
==abydos.distance.sift4:59
==abydos.distance.sift4simplest:54
def distabs(self, src: str, tar: str) -> float:
``Return the
common` Sift4 distance between two terms.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
int The Sift4 distance according to the common formula
Examples
cmp = Sift4() cmp.distabs('cat', 'hat') 1 cmp.distabs('Niall', 'Neil') 2 cmp.distabs('Colin', 'Cuilen') 3 cmp.distabs('ATCG', 'TAGC') 2
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class
if not src:
return len(tar)
if not tar:
return len(src)
src_len = len(src)
tar_len = len(tar)
src_cur = 0
tar_cur = 0
lcss = 0
local_cs = 0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:125 ==abydos.distance.minkowski:163 def dist(self, src: str, tar: str) -> float: ```Return the normalized Euclidean distance between two strings.
The normalized Euclidean distance is a distance
metric in :math:L^2
-space, normalized to [0, 1].
Parameters
src : str Source string (or QGrams/Counter objects) for comparison tar : str Target string (or QGrams/Counter objects) for comparison
Returns
float The normalized Euclidean distance
Examples
cmp = Euclidean() round(cmp.dist('cat', 'hat'), 12) 0.57735026919 round(cmp.dist('Niall', 'Neil'), 12) 0.683130051064 round(cmp.dist('Colin', 'Cuilen'), 12) 0.727606875109 cmp.dist('ATCG', 'TAGC') 1.0
.. versionadded:: 0.3.0 .. versionchanged:: 0.3.6 Encapsulated in class
return self.dist_abs(src, tar, normalized=True)
if __name__ == '__main__':
import doctest
doctest.testmod()
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.euclidean:40 ==abydos.distance.manhattan:40 ==abydos.distance.minkowski:49 alphabet: Optional[ Union[TCounter[str], Sequence[str], Set[str], int] ] = 0, tokenizer: Optional[Tokenizer] = None, intersection_type: str = 'crisp', **kwargs: Any ) -> None: ```Initialize Euclidean instance.
Parameters
alphabet : collection or int
The values or size of the alphabet
tokenizer : Tokenizer
A tokenizer instance from the :py:mod:abydos.tokenizer
package
intersectiontype : str
Specifies the intersection type, and set type as a result:
See :ref:intersection_type <intersection_type>
description in
:py:class:_TokenDistance
for details.
**kwargs
Arbitrary keyword arguments
Other Parameters
qval : int
The length of each q-gram. Using this parameter and tokenizer=None
will cause the instance to use the QGram tokenizer with this
q value.
metric : _Distance
A string distance measure class for use in the soft
and
fuzzy
variants.
threshold : float
A threshold value, similarities above which are counted as
members of the intersection for the fuzzy
variant.
.. versionadded:: 0.4.0
Wrong hanging indentation before block (add 4 spaces). Open
scaler: Optional[Union[str, Callable[[float], float]]] = None,
- Read upRead up
- Exclude checks
TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
hasattr(nltk_tokenizer, 'tokenize')
- Read upRead up
- Exclude checks
TODO hasattr(nltk_tokenizer, 'tokenize') ^ |