Showing 4,191 of 4,191 total issues
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.lcprefix:68 ==abydos.distance.lcsuffix:69 def dist_abs(self, src: str, tar: str, *args: str) -> int: ```Return the length of the longest common prefix of the strings.
Parameters
src : str Source string for comparison tar : str Target string for comparison *args : strs Additional strings for comparison
Raises
ValueError All arguments must be of type str
Returns
int The length of the longest common prefix
Examples
pfx = LCPrefix() pfx.distabs('cat', 'hat') 0 pfx.distabs('Niall', 'Neil') 1 pfx.distabs('aluminum', 'Catalan') 0 pfx.distabs('ATCG', 'TAGC') 0
.. versionadded:: 0.4.0
strings = [src, tar]
for arg in args:
if isinstance(arg, str):
strings.append(arg)
else:
raise TypeError('All arguments must be of type str')
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.ncdbz2:56 ==abydos.distance.ncdlzma:55 ==abydos.distance.ncdzlib:54 super().init(**kwargs) self._level = level
def dist(self, src: str, tar: str) -> float: ```Return the NCD between two strings using LZMA compression.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float Compression distance
Examples
cmp = NCDlzma() cmp.dist('cat', 'hat') 0.08695652173913043 cmp.dist('Niall', 'Neil') 0.16 cmp.dist('aluminum', 'Catalan') 0.16 cmp.dist('ATCG', 'TAGC') 0.08695652173913043
.. versionadded:: 0.3.5 .. versionchanged:: 0.3.6 Encapsulated in class
if src == tar:
return 0.0
src_b = src.encode('utf-8')
tar_b = tar.encode('utf-8')
Similar lines in 3 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.corvcluster:42 ==abydos.tokenizer.cvcluster:43 ==abydos.tokenizer.vccluster:43 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, consonants: Optional[Set[str]] = None, vowels: Optional[Set[str]] = None, ) -> None: ```Initialize tokenizer.
Parameters
scaler : None, str, or function A scaling function for the Counter:
- None : no scaling
- 'set' : All non-zero values are set to 1.
- 'length' : Each token has weight equal to its length.
- 'length-log' : Each token has weight equal to the log of its length + 1.
- 'length-exp' : Each token has weight equal to e raised to its length.
- a callable function : The function is applied to each value in the Counter. Some useful functions include math.exp, math.log1p, math.sqrt, and indexes into interesting integer sequences such as the Fibonacci sequence. consonants : None or set(str) The set of characters to treat as consonants vowels : None or set(str) The set of characters to treat as vowels
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.tokenizer.whitespace:41 ==abydos.tokenizer.wordpunct:42 def init( self, scaler: Optional[Union[str, Callable[[float], float]]] = None, flags: int = 0, ) -> None: ```Initialize tokenizer.
Parameters
scaler : None, str, or function A scaling function for the Counter:
- None : no scaling
- 'set' : All non-zero values are set to 1.
- 'length' : Each token has weight equal to its length.
- 'length-log' : Each token has weight equal to the log of its length + 1.
- 'length-exp' : Each token has weight equal to e raised to its length.
- a callable function : The function is applied to each value
in the Counter. Some useful functions include math.exp,
math.log1p, math.sqrt, and indexes into interesting integer
sequences such as the Fibonacci sequence.
flags : int
Flags to pass to the regular expression matcher. See the
documentation on Python's re module <https://docs.python.org/3/library/re.html#re.A>
_ for details.
.. versionadded:: 0.4.0
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.phoneticeditdistance:107 ==abydos.phones.phones:956 if isinstance(weights, dict): weights = [ weights[feature] if feature in weights else 0 for feature in sorted( FEATUREMASK, key=FEATUREMASK.get, reverse=True ) ] elif isinstance(weights, (list, tuple)): weights = list(weights) + [0] * (len(FEATUREMASK) - len(weights))
Wrong hanging indentation before block (add 4 spaces). Open
self,
- Read upRead up
- Exclude checks
TODO self, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
self, scaler: Optional[Union[str, Callable[[float], float]]] = None,
- Read upRead up
- Exclude checks
TODO self, scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
flags: int = 0,
- Read upRead up
- Exclude checks
TODO flags: int = 0, ^ |
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:285 ==abydos.distance.phoneticeditdistance:253 dmat = cast( np.ndarray, self.alignmentmatrix(src, tar, backtrace=False) )
if int(dmat[srclen, tarlen]) == dmat[srclen, tarlen]: return int(dmat[srclen, tarlen]) else: return cast(float, dmat[srclen, tarlen])
def dist(self, src: str, tar: str) -> float: ```Return the normalized phonetic edit distance between two strings.
The edit distance is normalized by dividing the edit distance
(calculated by either of the two supported methods) by the
greater of the number of characters in src times the cost of a delete
and the number of characters in tar times the cost of an insert.
For the case in which all operations have :math:cost = 1
, this is
equivalent to the greater of the length of the two strings src & tar.
Parameters
src : str Source string for comparison tar : str Target string for comparison
Returns
float The normalized Levenshtein distance between src & tar
Examples
cmp = PhoneticEditDistance() round(cmp.dist('cat', 'hat'), 12) 0.059139784946 round(cmp.dist('Niall', 'Neil'), 12) 0.232258064516 cmp.dist('aluminum', 'Catalan') 0.3084677419354839 cmp.dist('ATCG', 'TAGC') 0.2983870967741935
.. versionadded:: 0.4.1
if src == tar:
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.blocklevenshtein:138 ==abydos.distance.dameraulevenshtein:232 if src == tar: return 0.0 inscost, delcost = self.cost[:2] return self.distabs(src, tar) / ( self.normalizer([len(src) * delcost, len(tar) * ins_cost]) )
if name == 'main': import doctest
doctest.testmod()
Similar lines in 2 files Open
# Copyright 2014-2020 by Christopher C. Little.
- Read upRead up
- Exclude checks
Indicates that a set of similar lines has been detected among multiple file. This usually means that the code should be refactored to avoid this duplication. ==abydos.distance.discountedlevenshtein:204 ==abydos.distance.phoneticeditdistance:181 ) dmat[i + 1, j + 1] = min(opts) if backtrace: trace_mat[i + 1, j + 1] = int(np.argmin(opts))
if self._mode == 'osa': if ( i + 1 > 1 and j + 1 > 1
Wrong hanging indentation before block (add 4 spaces). Open
vowels: Optional[Set[str]] = None,
- Read upRead up
- Exclude checks
TODO vowels: Optional[Set[str]] = None, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
scaler: Optional[Union[str, Callable[[float], float]]] = None,
- Read upRead up
- Exclude checks
TODO scaler: Optional[Union[str, Callable[[float], float]]] = None, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
vowels: Optional[Set[str]] = None,
- Read upRead up
- Exclude checks
TODO vowels: Optional[Set[str]] = None, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
and token[0] not in self._vowels
- Read upRead up
- Exclude checks
TODO and token[0] not in self._vowels ^ |
Unused variable 'count' Open
count, term_doc_count = self.corpus[term]
- Read upRead up
- Exclude checks
Used when a variable is defined but not used.
Wrong hanging indentation before block (add 4 spaces). Open
flags: int = 0,
- Read upRead up
- Exclude checks
TODO flags: int = 0, ^ |
Wrong hanging indentation before block (add 4 spaces). Open
corpus: Corpus,
- Read upRead up
- Exclude checks
TODO corpus: Corpus, ^ |
Unused argument 'args' Open
self,
- Read upRead up
- Exclude checks
Used when a function or method argument is not used.