HazureChi/kenchi

View on GitHub

Showing 37 of 37 total issues

File statistical.py has 534 lines of code (exceeds 250 allowed). Consider refactoring.
Open

import numpy as np
from sklearn.cluster import affinity_propagation
from sklearn.covariance import GraphicalLasso
from sklearn.mixture import GaussianMixture
from sklearn.neighbors import KernelDensity
Severity: Major
Found in kenchi/outlier_detection/statistical.py - About 1 day to fix

    File base.py has 330 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    import os
    
    import numpy as np
    from sklearn.datasets import load_breast_cancer
    from sklearn.utils import check_random_state, Bunch
    Severity: Minor
    Found in kenchi/datasets/base.py - About 3 hrs to fix

      Function plot_anomaly_score has a Cognitive Complexity of 26 (exceeds 5 allowed). Consider refactoring.
      Open

      def plot_anomaly_score(
          anomaly_score, ax=None, bins='auto', figsize=None,
          filename=None, hist=True, kde=True, threshold=None,
          title=None, xlabel='Samples', xlim=None, ylabel='Anomaly score',
          ylim=None, **kwargs
      Severity: Minor
      Found in kenchi/plotting.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      File plotting.py has 312 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      import numpy as np
      from scipy.stats import gaussian_kde
      from sklearn.metrics import auc, roc_curve
      from sklearn.utils.validation import check_array, check_symmetric, column_or_1d
      
      
      Severity: Minor
      Found in kenchi/plotting.py - About 3 hrs to fix

        File base.py has 296 lines of code (exceeds 250 allowed). Consider refactoring.
        Open

        from abc import abstractmethod, ABC
        
        import numpy as np
        from scipy.stats import norm
        from sklearn.base import BaseEstimator
        Severity: Minor
        Found in kenchi/outlier_detection/base.py - About 3 hrs to fix

          Cyclomatic complexity is too high in function plot_anomaly_score. (14)
          Open

          def plot_anomaly_score(
              anomaly_score, ax=None, bins='auto', figsize=None,
              filename=None, hist=True, kde=True, threshold=None,
              title=None, xlabel='Samples', xlim=None, ylabel='Anomaly score',
              ylim=None, **kwargs
          Severity: Minor
          Found in kenchi/plotting.py by radon

          Cyclomatic Complexity

          Cyclomatic Complexity corresponds to the number of decisions a block of code contains plus 1. This number (also called McCabe number) is equal to the number of linearly independent paths through the code. This number can be used as a guide when testing conditional logic in blocks.

          Radon analyzes the AST tree of a Python program to compute Cyclomatic Complexity. Statements have the following effects on Cyclomatic Complexity:

          Construct Effect on CC Reasoning
          if +1 An if statement is a single decision.
          elif +1 The elif statement adds another decision.
          else +0 The else statement does not cause a new decision. The decision is at the if.
          for +1 There is a decision at the start of the loop.
          while +1 There is a decision at the while statement.
          except +1 Each except branch adds a new conditional path of execution.
          finally +0 The finally block is unconditionally executed.
          with +1 The with statement roughly corresponds to a try/except block (see PEP 343 for details).
          assert +1 The assert statement internally roughly equals a conditional statement.
          Comprehension +1 A list/set/dict comprehension of generator expression is equivalent to a for loop.
          Boolean Operator +1 Every boolean operator (and, or) adds a decision point.

          Source: http://radon.readthedocs.org/en/latest/intro.html

          File pipeline.py has 258 lines of code (exceeds 250 allowed). Consider refactoring.
          Open

          from sklearn.externals.joblib import dump
          from sklearn.pipeline import _name_estimators, Pipeline as _Pipeline
          from sklearn.utils.metaestimators import if_delegate_has_method
          
          __all__ = ['make_pipeline', 'Pipeline']
          Severity: Minor
          Found in kenchi/pipeline.py - About 2 hrs to fix

            Similar blocks of code found in 2 locations. Consider refactoring.
            Open

                def __init__(
                    self, contamination=0.1, iterated_power='auto', n_components=None,
                    random_state=None, svd_solver='auto', tol=0., whiten=False
                ):
                    self.contamination  = contamination
            Severity: Major
            Found in kenchi/outlier_detection/reconstruction_based.py and 1 other location - About 2 hrs to fix
            kenchi/outlier_detection/ensemble.py on lines 89..99

            Duplicated Code

            Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

            Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

            When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

            Tuning

            This issue has a mass of 50.

            We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

            The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

            If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

            See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

            Refactorings

            Further Reading

            Similar blocks of code found in 2 locations. Consider refactoring.
            Open

                def __init__(
                    self, bootstrap=False, contamination='auto', max_features=1.0,
                    max_samples='auto', n_estimators=100, n_jobs=1, random_state=None
                ):
                    self.bootstrap     = bootstrap
            Severity: Major
            Found in kenchi/outlier_detection/ensemble.py and 1 other location - About 2 hrs to fix
            kenchi/outlier_detection/reconstruction_based.py on lines 121..131

            Duplicated Code

            Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

            Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

            When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

            Tuning

            This issue has a mass of 50.

            We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

            The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

            If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

            See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

            Refactorings

            Further Reading

            Function plot_anomaly_score has 14 arguments (exceeds 4 allowed). Consider refactoring.
            Open

            def plot_anomaly_score(
            Severity: Major
            Found in kenchi/plotting.py - About 1 hr to fix

              Function __init__ has 13 arguments (exceeds 4 allowed). Consider refactoring.
              Open

                  def __init__(
              Severity: Major
              Found in kenchi/outlier_detection/statistical.py - About 1 hr to fix

                Similar blocks of code found in 2 locations. Consider refactoring.
                Open

                    @if_delegate_has_method(delegate='_final_estimator')
                    def anomaly_score(self, X=None, **kwargs):
                        """Apply transforms, and compute the anomaly score for each sample with
                        the final estimator.
                
                
                Severity: Major
                Found in kenchi/pipeline.py and 1 other location - About 1 hr to fix
                kenchi/pipeline.py on lines 179..236

                Duplicated Code

                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                Tuning

                This issue has a mass of 44.

                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                Refactorings

                Further Reading

                Similar blocks of code found in 2 locations. Consider refactoring.
                Open

                    @if_delegate_has_method(delegate='_final_estimator')
                    def plot_anomaly_score(self, X=None, **kwargs):
                        """Apply transoforms, and plot the anomaly score for each sample with
                        the final estimator.
                
                
                Severity: Major
                Found in kenchi/pipeline.py and 1 other location - About 1 hr to fix
                kenchi/pipeline.py on lines 117..138

                Duplicated Code

                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                Tuning

                This issue has a mass of 44.

                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                Refactorings

                Further Reading

                Function __init__ has 11 arguments (exceeds 4 allowed). Consider refactoring.
                Open

                    def __init__(
                Severity: Major
                Found in kenchi/outlier_detection/clustering_based.py - About 1 hr to fix

                  Function __init__ has 10 arguments (exceeds 4 allowed). Consider refactoring.
                  Open

                      def __init__(
                  Severity: Major
                  Found in kenchi/outlier_detection/distance_based.py - About 1 hr to fix

                    Function load_pendigits has a Cognitive Complexity of 11 (exceeds 5 allowed). Consider refactoring.
                    Open

                    def load_pendigits(random_state=None, return_X_y=False, subset='kriegel11'):
                        """Load and return the pendigits dataset.
                    
                        Kriegel's structure (subset='kriegel11') :
                    
                    
                    Severity: Minor
                    Found in kenchi/datasets/base.py - About 1 hr to fix

                    Cognitive Complexity

                    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                    A method's cognitive complexity is based on a few simple rules:

                    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                    • Code is considered more complex for each "break in the linear flow of the code"
                    • Code is considered more complex when "flow breaking structures are nested"

                    Further reading

                    Function __init__ has 10 arguments (exceeds 4 allowed). Consider refactoring.
                    Open

                        def __init__(
                    Severity: Major
                    Found in kenchi/outlier_detection/statistical.py - About 1 hr to fix

                      Function plot_roc_curve has 9 arguments (exceeds 4 allowed). Consider refactoring.
                      Open

                      def plot_roc_curve(
                      Severity: Major
                      Found in kenchi/plotting.py - About 1 hr to fix

                        Function __init__ has 9 arguments (exceeds 4 allowed). Consider refactoring.
                        Open

                            def __init__(
                        Severity: Major
                        Found in kenchi/outlier_detection/density_based.py - About 1 hr to fix

                          Function __init__ has 9 arguments (exceeds 4 allowed). Consider refactoring.
                          Open

                              def __init__(
                          Severity: Major
                          Found in kenchi/outlier_detection/angle_based.py - About 1 hr to fix
                            Severity
                            Category
                            Status
                            Source
                            Language