giganticode/codeprep

View on GitHub
codeprep/pipeline/dataset.py

Summary

Maintainability
C
1 day
Test Coverage

Dataset has 29 functions (exceeds 20 allowed). Consider refactoring.
Open

class Dataset(object):
    """
    Abstaction that incapsulates the location of the dataset in the file system and assures integrity of intermediate
    representation of data when the data preprocessing operation consists of multiple steps.
    """
Severity: Minor
Found in codeprep/pipeline/dataset.py - About 3 hrs to fix

    File dataset.py has 260 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    # SPDX-FileCopyrightText: 2020 Hlib Babii <hlibbabii@gmail.com>
    #
    # SPDX-License-Identifier: Apache-2.0
    
    import ast
    Severity: Minor
    Found in codeprep/pipeline/dataset.py - About 2 hrs to fix

      Consider simplifying this complex logical expression.
      Open

              if isinstance(o, Dataset):
                  return self._path == o._path and \
                         self._prep_config == o._prep_config and \
                         self._normalized_extension_list == o._normalized_extension_list and \
                         self._custom_bpe_config == o._custom_bpe_config and \
      Severity: Critical
      Found in codeprep/pipeline/dataset.py - About 2 hrs to fix

        Function get_all_files has a Cognitive Complexity of 12 (exceeds 5 allowed). Consider refactoring.
        Open

            def get_all_files(self, return_dirs_instead_of_regular_files: bool=False) -> Generator[bytes, None, None]:
                if self.files_need_to_be_saved():
                    if not os.path.exists(self.path_to_file_list_folder):
                        os.makedirs(self.path_to_file_list_folder)
                    for filepath in walk_and_save(self.original.path,
        Severity: Minor
        Found in codeprep/pipeline/dataset.py - About 1 hr to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function __init__ has 7 arguments (exceeds 4 allowed). Consider refactoring.
        Open

            def __init__(self, path: str, prep_config: PrepConfig, normalized_extension_list: Optional[List[str]],
        Severity: Major
        Found in codeprep/pipeline/dataset.py - About 50 mins to fix

          Function create has 7 arguments (exceeds 4 allowed). Consider refactoring.
          Open

              def create(cls: Type['Dataset'], path_to_dataset: str, prep_config: PrepConfig, extensions: Optional[str],
          Severity: Major
          Found in codeprep/pipeline/dataset.py - About 50 mins to fix

            Function create has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.
            Open

                def create(cls: Type['Dataset'], path_to_dataset: str, prep_config: PrepConfig, extensions: Optional[str],
                           custom_bpe_config: Optional[CustomBpeConfig],
                           bpe_config: Optional[BpeConfig] = None,
                           overriden_path_to_prep_dataset: Optional[str] = None, suppress_caching: bool = False) -> 'Dataset':
                    if not os.path.exists(path_to_dataset):
            Severity: Minor
            Found in codeprep/pipeline/dataset.py - About 25 mins to fix

            Cognitive Complexity

            Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

            A method's cognitive complexity is based on a few simple rules:

            • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
            • Code is considered more complex for each "break in the linear flow of the code"
            • Code is considered more complex when "flow breaking structures are nested"

            Further reading

            There are no issues that match your filters.

            Category
            Status