Showing 82 of 82 total issues
File tokens.py
has 603 lines of code (exceeds 250 allowed). Consider refactoring. Open
# SPDX-FileCopyrightText: 2020 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0
from abc import ABC, abstractmethod
Function encode
has a Cognitive Complexity of 44 (exceeds 5 allowed). Consider refactoring. Open
def encode(words: Dict[str, int], merges: MergeList) -> Dict[str, int]:
letters_list = {" ".join(to_char_list(k)): v for k, v in words.items()}
new_letters_list = {}
for letters, freq in letters_list.items():
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function update_neighbour_index
has a Cognitive Complexity of 42 (exceeds 5 allowed). Consider refactoring. Open
def update_neighbour_index(location_index, neighbour_index, pair_to_merge):
for side in Side:
disappearing_pairs = neighbour_index[pair_to_merge][side]
for disappearing_pair in disappearing_pairs:
if can_be_concat(disappearing_pair, pair_to_merge, side):
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
File wild_bpe.py
has 395 lines of code (exceeds 250 allowed). Consider refactoring. Open
# SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0
import logging
File text.py
has 368 lines of code (exceeds 250 allowed). Consider refactoring. Open
# SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0
from pathlib import Path
Function walk_and_save
has a Cognitive Complexity of 32 (exceeds 5 allowed). Consider refactoring. Open
def walk_and_save(path: str, dir_list_path: str, file_list_path: str, return_dirs_instead_of_regular_files: bool,
extensions: Optional[List[str]]) -> Generator[bytes, None, None]:
with open(dir_list_path, 'w') as d, open(file_list_path, 'w') as f:
path_bin = path.encode()
extensions_bin = list(map(lambda e: e.encode(), extensions)) if extensions else None
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function get_dir_last_modification
has a Cognitive Complexity of 29 (exceeds 5 allowed). Consider refactoring. Open
def get_dir_last_modification(path: str, limit: int = LIMIT_FILES_ON_LAST_MODIFICATION_CHECK) -> datetime:
def walk_path(path):
counter = 0
if os.path.isfile(path) or len(os.listdir(path)) == 0:
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function update_location_index
has a Cognitive Complexity of 28 (exceeds 5 allowed). Consider refactoring. Open
def update_location_index(location_index, neighbour_index, pair_to_merge):
occurence_changes = []
disappearing_pairs = neighbour_index[pair_to_merge]
main_list = location_index[pair_to_merge]
if pair_to_merge in neighbour_index[pair_to_merge][Side.any()]:
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Dataset
has 29 functions (exceeds 20 allowed). Consider refactoring. Open
class Dataset(object):
"""
Abstaction that incapsulates the location of the dataset in the file system and assures integrity of intermediate
representation of data when the data preprocessing operation consists of multiple steps.
"""
File vocab.py
has 302 lines of code (exceeds 250 allowed). Consider refactoring. Open
# SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0
import logging.config
Function run
has a Cognitive Complexity of 20 (exceeds 5 allowed). Consider refactoring. Open
def run(dataset: Dataset, custom_bpe_config: Optional[CustomBpeConfig]) -> None:
path_to_parsed_dataset = dataset.parsed.path
if not os.path.exists(path_to_parsed_dataset):
logger.error(f"Dir does not exist: {path_to_parsed_dataset}")
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
File codestructure.py
has 274 lines of code (exceeds 250 allowed). Consider refactoring. Open
# SPDX-FileCopyrightText: 2020 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0
import bisect
Function create_split_value
has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring. Open
def create_split_value(split_type: str, bpe_codes_id: Optional[str] = None, full_strings: bool = False,
split_numbers: bool = False, ronin: bool = False, stem: bool = False):
if split_type == 'nosplit':
return 'F' if full_strings else '0'
elif split_type == 'chars':
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function merge_vocab
has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring. Open
def merge_vocab(pair: Tuple[str, str], input_vocab: Dict[str, int]) -> Tuple[Dict[str, int], List]:
"""
>>> pair = ('w', 'o')
>>> input_vocab = {'b i r d @': 3, 'w o r d @': 7, 'w o g @': 13}
>>> new_vocab, new_pairs = merge_vocab(pair, input_vocab)
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
File dataset.py
has 260 lines of code (exceeds 250 allowed). Consider refactoring. Open
# SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0
import ast
Function run
has a Cognitive Complexity of 16 (exceeds 5 allowed). Consider refactoring. Open
def run(generator: Generator[str, None, None], n_merges: int=sys.maxsize,
include_performance_stats_every_n_merges: int = 0) \
-> Tuple[str, int, Optional[List[BpePerformanceStatsEntry]]]:
checkpoint = time.time()
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Consider simplifying this complex logical expression. Open
if isinstance(o, Dataset):
return self._path == o._path and \
self._prep_config == o._prep_config and \
self._normalized_extension_list == o._normalized_extension_list and \
self._custom_bpe_config == o._custom_bpe_config and \
Function getsize
has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring. Open
def getsize(obj):
zero_depth_bases = (str, bytes, Number, range, bytearray)
iteritems = 'items'
def _getsize(obj_0):
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function init_bpe_data
has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring. Open
def init_bpe_data(prep_config: PrepConfig, custom_bpe_config: Optional[CustomBpeConfig], force_reinit: bool=True):
if get_global_bpe_data_if_available() and not force_reinit:
return # already initialized
global global_bpe_data
global_bpe_data = BpeData()
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function basic
has 14 arguments (exceeds 4 allowed). Consider refactoring. Open
def basic(path: str, extensions: Optional[str] = None, split_numbers: bool = False, ronin = False, stem: bool = False,