giganticode/codeprep

View on GitHub

Showing 82 of 82 total issues

File tokens.py has 603 lines of code (exceeds 250 allowed). Consider refactoring.
Open

# SPDX-FileCopyrightText: 2020 2020 Hlib Babii <hlibbabii@gmail.com>
#
# SPDX-License-Identifier: Apache-2.0

from abc import ABC, abstractmethod
Severity: Major
Found in codeprep/preprocess/tokens.py - About 1 day to fix

    Function encode has a Cognitive Complexity of 44 (exceeds 5 allowed). Consider refactoring.
    Open

    def encode(words: Dict[str, int], merges: MergeList) -> Dict[str, int]:
        letters_list = {" ".join(to_char_list(k)): v for k, v in words.items()}
    
        new_letters_list = {}
        for letters, freq in letters_list.items():
    Severity: Minor
    Found in codeprep/bpepkg/bpe_encode.py - About 6 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function update_neighbour_index has a Cognitive Complexity of 42 (exceeds 5 allowed). Consider refactoring.
    Open

    def update_neighbour_index(location_index, neighbour_index, pair_to_merge):
        for side in Side:
            disappearing_pairs = neighbour_index[pair_to_merge][side]
            for disappearing_pair in disappearing_pairs:
                if can_be_concat(disappearing_pair, pair_to_merge, side):
    Severity: Minor
    Found in codeprep/bpepkg/wild_bpe.py - About 6 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File wild_bpe.py has 395 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    # SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
    #
    # SPDX-License-Identifier: Apache-2.0
    
    import logging
    Severity: Minor
    Found in codeprep/bpepkg/wild_bpe.py - About 5 hrs to fix

      File text.py has 368 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      # SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
      #
      # SPDX-License-Identifier: Apache-2.0
      
      from pathlib import Path
      Severity: Minor
      Found in codeprep/api/text.py - About 4 hrs to fix

        Function walk_and_save has a Cognitive Complexity of 32 (exceeds 5 allowed). Consider refactoring.
        Open

        def walk_and_save(path: str, dir_list_path: str, file_list_path: str, return_dirs_instead_of_regular_files: bool,
                          extensions: Optional[List[str]]) -> Generator[bytes, None, None]:
            with open(dir_list_path, 'w') as d, open(file_list_path, 'w') as f:
                path_bin = path.encode()
                extensions_bin = list(map(lambda e: e.encode(), extensions)) if extensions else None
        Severity: Minor
        Found in codeprep/util/dir.py - About 4 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function get_dir_last_modification has a Cognitive Complexity of 29 (exceeds 5 allowed). Consider refactoring.
        Open

        def get_dir_last_modification(path: str, limit: int = LIMIT_FILES_ON_LAST_MODIFICATION_CHECK) -> datetime:
        
            def walk_path(path):
                counter = 0
                if os.path.isfile(path) or len(os.listdir(path)) == 0:
        Severity: Minor
        Found in codeprep/util/dir.py - About 4 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function update_location_index has a Cognitive Complexity of 28 (exceeds 5 allowed). Consider refactoring.
        Open

        def update_location_index(location_index, neighbour_index, pair_to_merge):
            occurence_changes = []
            disappearing_pairs = neighbour_index[pair_to_merge]
            main_list = location_index[pair_to_merge]
            if pair_to_merge in neighbour_index[pair_to_merge][Side.any()]:
        Severity: Minor
        Found in codeprep/bpepkg/wild_bpe.py - About 4 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Dataset has 29 functions (exceeds 20 allowed). Consider refactoring.
        Open

        class Dataset(object):
            """
            Abstaction that incapsulates the location of the dataset in the file system and assures integrity of intermediate
            representation of data when the data preprocessing operation consists of multiple steps.
            """
        Severity: Minor
        Found in codeprep/pipeline/dataset.py - About 3 hrs to fix

          File vocab.py has 302 lines of code (exceeds 250 allowed). Consider refactoring.
          Open

          # SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
          #
          # SPDX-License-Identifier: Apache-2.0
          
          import logging.config
          Severity: Minor
          Found in codeprep/pipeline/vocab.py - About 3 hrs to fix

            Function run has a Cognitive Complexity of 20 (exceeds 5 allowed). Consider refactoring.
            Open

            def run(dataset: Dataset, custom_bpe_config: Optional[CustomBpeConfig]) -> None:
                path_to_parsed_dataset = dataset.parsed.path
            
                if not os.path.exists(path_to_parsed_dataset):
                    logger.error(f"Dir does not exist: {path_to_parsed_dataset}")
            Severity: Minor
            Found in codeprep/pipeline/to_repr.py - About 2 hrs to fix

            Cognitive Complexity

            Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

            A method's cognitive complexity is based on a few simple rules:

            • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
            • Code is considered more complex for each "break in the linear flow of the code"
            • Code is considered more complex when "flow breaking structures are nested"

            Further reading

            File codestructure.py has 274 lines of code (exceeds 250 allowed). Consider refactoring.
            Open

            # SPDX-FileCopyrightText: 2020 2020 Hlib Babii <hlibbabii@gmail.com>
            #
            # SPDX-License-Identifier: Apache-2.0
            
            import bisect
            Severity: Minor
            Found in codeprep/preprocess/codestructure.py - About 2 hrs to fix

              Function create_split_value has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring.
              Open

              def create_split_value(split_type: str, bpe_codes_id: Optional[str] = None, full_strings: bool = False,
                                     split_numbers: bool = False, ronin: bool = False, stem: bool = False):
                  if split_type == 'nosplit':
                      return 'F' if full_strings else '0'
                  elif split_type == 'chars':
              Severity: Minor
              Found in codeprep/api/common.py - About 2 hrs to fix

              Cognitive Complexity

              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

              A method's cognitive complexity is based on a few simple rules:

              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
              • Code is considered more complex for each "break in the linear flow of the code"
              • Code is considered more complex when "flow breaking structures are nested"

              Further reading

              Function merge_vocab has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring.
              Open

              def merge_vocab(pair: Tuple[str, str], input_vocab: Dict[str, int]) -> Tuple[Dict[str, int], List]:
                  """
                  >>> pair = ('w', 'o')
                  >>> input_vocab = {'b i r d @': 3, 'w o r d @': 7, 'w o g @': 13}
                  >>> new_vocab, new_pairs = merge_vocab(pair, input_vocab)
              Severity: Minor
              Found in codeprep/bpepkg/bpe_learn.py - About 2 hrs to fix

              Cognitive Complexity

              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

              A method's cognitive complexity is based on a few simple rules:

              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
              • Code is considered more complex for each "break in the linear flow of the code"
              • Code is considered more complex when "flow breaking structures are nested"

              Further reading

              File dataset.py has 260 lines of code (exceeds 250 allowed). Consider refactoring.
              Open

              # SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
              #
              # SPDX-License-Identifier: Apache-2.0
              
              import ast
              Severity: Minor
              Found in codeprep/pipeline/dataset.py - About 2 hrs to fix

                Function run has a Cognitive Complexity of 16 (exceeds 5 allowed). Consider refactoring.
                Open

                def run(generator: Generator[str, None, None], n_merges: int=sys.maxsize,
                        include_performance_stats_every_n_merges: int = 0) \
                        -> Tuple[str, int, Optional[List[BpePerformanceStatsEntry]]]:
                
                    checkpoint = time.time()
                Severity: Minor
                Found in codeprep/bpepkg/wild_bpe.py - About 2 hrs to fix

                Cognitive Complexity

                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                A method's cognitive complexity is based on a few simple rules:

                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                • Code is considered more complex for each "break in the linear flow of the code"
                • Code is considered more complex when "flow breaking structures are nested"

                Further reading

                Consider simplifying this complex logical expression.
                Open

                        if isinstance(o, Dataset):
                            return self._path == o._path and \
                                   self._prep_config == o._prep_config and \
                                   self._normalized_extension_list == o._normalized_extension_list and \
                                   self._custom_bpe_config == o._custom_bpe_config and \
                Severity: Critical
                Found in codeprep/pipeline/dataset.py - About 2 hrs to fix

                  Function getsize has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
                  Open

                  def getsize(obj):
                      zero_depth_bases = (str, bytes, Number, range, bytearray)
                      iteritems = 'items'
                  
                      def _getsize(obj_0):
                  Severity: Minor
                  Found in codeprep/util/misc.py - About 1 hr to fix

                  Cognitive Complexity

                  Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                  A method's cognitive complexity is based on a few simple rules:

                  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                  • Code is considered more complex for each "break in the linear flow of the code"
                  • Code is considered more complex when "flow breaking structures are nested"

                  Further reading

                  Function init_bpe_data has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
                  Open

                  def init_bpe_data(prep_config: PrepConfig, custom_bpe_config: Optional[CustomBpeConfig], force_reinit: bool=True):
                      if get_global_bpe_data_if_available() and not force_reinit:
                          return # already initialized
                      global global_bpe_data
                      global_bpe_data = BpeData()
                  Severity: Minor
                  Found in codeprep/pipeline/to_repr.py - About 1 hr to fix

                  Cognitive Complexity

                  Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                  A method's cognitive complexity is based on a few simple rules:

                  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                  • Code is considered more complex for each "break in the linear flow of the code"
                  • Code is considered more complex when "flow breaking structures are nested"

                  Further reading

                  Function basic has 14 arguments (exceeds 4 allowed). Consider refactoring.
                  Open

                  def basic(path: str, extensions: Optional[str] = None, split_numbers: bool = False, ronin = False, stem: bool = False,
                  Severity: Major
                  Found in codeprep/api/corpus.py - About 1 hr to fix
                    Severity
                    Category
                    Status
                    Source
                    Language