yzhao062/Pyod

View on GitHub

Showing 181 of 269 total issues

Similar blocks of code found in 2 locations. Consider refactoring.
Open

    def plot_learning_curves(self, start_ind=0, window_smoothening=10):
        fig = plt.figure(figsize=(12, 5))

        l_gen = pd.Series(self.hist_loss_gen[start_ind:]).rolling(
            window_smoothening).mean()
Severity: Major
Found in pyod/models/alad.py and 1 other location - About 1 day to fix
pyod/models/anogan.py on lines 206..227

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 208.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

    def plot_learning_curves(self, start_ind=0,
                             window_smoothening=10):  # pragma: no cover
        fig = plt.figure(figsize=(12, 5))

        l_gen = pd.Series(self.hist_loss_generator[start_ind:]).rolling(
Severity: Major
Found in pyod/models/anogan.py and 1 other location - About 1 day to fix
pyod/models/alad.py on lines 445..465

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 208.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

def _parallel_ecdf(n_dims, X):
    """Private method to calculate ecdf in parallel.    
    Parameters
    ----------
    n_dims : int
Severity: Major
Found in pyod/models/copod.py and 1 other location - About 1 day to fix
pyod/models/ecod.py on lines 28..52

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 162.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

def _parallel_ecdf(n_dims, X):
    """Private method to calculate ecdf in parallel.
    Parameters
    ----------
    n_dims : int
Severity: Major
Found in pyod/models/ecod.py and 1 other location - About 1 day to fix
pyod/models/copod.py on lines 27..51

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 162.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

File thresholds.py has 467 lines of code (exceeds 250 allowed). Consider refactoring.
Open

def AUCP(**kwargs):
    """AUCP class for Area Under Curve Precentage thresholder.

       Use the area under the curve to evaluate a non-parametric means
       to threshold scores generated by the decision_scores where outliers
Severity: Minor
Found in pyod/models/thresholds.py - About 7 hrs to fix

    Similar blocks of code found in 2 locations. Consider refactoring.
    Open

            for i in range(X_norm.shape[0]):
                if (self.verbose == 1):
                    print('query sample {} / {}'.format(i + 1, X_norm.shape[0]))
    
                sample = X_norm[[i],]
    Severity: Major
    Found in pyod/models/anogan.py and 1 other location - About 5 hrs to fix
    pyod/models/anogan.py on lines 386..392

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 89.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    Similar blocks of code found in 2 locations. Consider refactoring.
    Open

            for i in range(X_norm.shape[0]):
                if (self.verbose == 1):
                    print('query sample {} / {}'.format(i + 1, X_norm.shape[0]))
    
                sample = X_norm[[i],]
    Severity: Major
    Found in pyod/models/anogan.py and 1 other location - About 5 hrs to fix
    pyod/models/anogan.py on lines 427..433

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 89.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    Function generate_data_clusters has a Cognitive Complexity of 30 (exceeds 5 allowed). Consider refactoring.
    Open

    def generate_data_clusters(n_train=1000, n_test=500, n_clusters=2,
                               n_features=2, contamination=0.1, size='same',
                               density='same', dist=0.25, random_state=None,
                               return_in_clusters=False):
        """Utility function to generate synthesized data in clusters.
    Severity: Minor
    Found in pyod/utils/data.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Identical blocks of code found in 2 locations. Consider refactoring.
    Open

            all_results = Parallel(n_jobs=n_jobs, max_nbytes=None,
                                   verbose=True)(
                delayed(_parallel_ecdf)(
                    n_dims_list[i],
                    X[:, starts[i]:starts[i + 1]],
    Severity: Major
    Found in pyod/models/copod.py and 1 other location - About 4 hrs to fix
    pyod/models/ecod.py on lines 181..187

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 77.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    Similar blocks of code found in 4 locations. Consider refactoring.
    Open

            for i, l_dim in enumerate(self.G_layers):
                layer_name = 'hl_{}'.format(i)
                G_hl_dict[layer_name] = Dropout(self.dropout_rate)(
                    Dense(l_dim, activation=self.activation_hidden)(last_layer))
                last_layer = G_hl_dict[layer_name]
    Severity: Major
    Found in pyod/models/anogan.py and 3 other locations - About 3 hrs to fix
    pyod/models/alad.py on lines 204..209
    pyod/models/alad.py on lines 224..229
    pyod/models/anogan.py on lines 187..191

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 72.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    Similar blocks of code found in 4 locations. Consider refactoring.
    Open

            for i, l_dim in enumerate(self.D_layers):
                layer_name = 'hl_{}'.format(i)
                D_hl_dict[layer_name] = Dropout(self.dropout_rate)(
                    Dense(l_dim, activation=self.activation_hidden)(last_layer))
                last_layer = D_hl_dict[layer_name]
    Severity: Major
    Found in pyod/models/anogan.py and 3 other locations - About 3 hrs to fix
    pyod/models/alad.py on lines 204..209
    pyod/models/alad.py on lines 224..229
    pyod/models/anogan.py on lines 169..173

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 72.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    File dif.py has 324 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    # -*- coding: utf-8 -*-
    """Deep Isolation Forest for Anomaly Detection (DIF)
    """
    # Author: Hongzuo Xu <hongzuoxu@126.edu>
    # License: BSD 2 clause
    Severity: Minor
    Found in pyod/models/dif.py - About 3 hrs to fix

      File kpca.py has 324 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      # -*- coding: utf-8 -*-
      """Kernel Principal Component Analysis (KPCA) Outlier Detector
      """
      # Author: Akira Tamamori <tamamori5917@gmail.com>
      # License: BSD 2 clause
      Severity: Minor
      Found in pyod/models/kpca.py - About 3 hrs to fix

        Function generate_data_categorical has a Cognitive Complexity of 24 (exceeds 5 allowed). Consider refactoring.
        Open

        def generate_data_categorical(n_train=1000, n_test=500, n_features=2,
                                      n_informative=2, n_category_in=2,
                                      n_category_out=2, contamination=0.1,
                                      shuffle=True, random_state=None):
            """Utility function to generate synthesized categorical data.
        Severity: Minor
        Found in pyod/utils/data.py - About 3 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Similar blocks of code found in 2 locations. Consider refactoring.
        Open

                if self.n_neighbors >= self.n_train_:
                    self.n_neighbors = self.n_train_ - 1
                    warnings.warn("n_neighbors is set to the number of "
                                  "training points minus 1: {0}".format(self.n_train_))
        
        
        Severity: Major
        Found in pyod/models/abod.py and 1 other location - About 3 hrs to fix
        pyod/models/cof.py on lines 112..118

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 66.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Identical blocks of code found in 2 locations. Consider refactoring.
        Open

        if _get_tensorflow_version() < 200:
            raise NotImplementedError('Model not implemented for Tensorflow version 1')
        elif 200 <= _get_tensorflow_version() <= 209:
            import tensorflow as tf
            from tensorflow.keras.models import Model
        Severity: Major
        Found in pyod/models/alad.py and 1 other location - About 3 hrs to fix
        pyod/models/anogan.py on lines 22..34

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 63.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Identical blocks of code found in 2 locations. Consider refactoring.
        Open

        if _get_tensorflow_version() < 200:
            raise NotImplementedError('Model not implemented for Tensorflow version 1')
        
        elif 200 <= _get_tensorflow_version() <= 209:
            import tensorflow as tf
        Severity: Major
        Found in pyod/models/anogan.py and 1 other location - About 3 hrs to fix
        pyod/models/alad.py on lines 21..32

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 63.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Similar blocks of code found in 2 locations. Consider refactoring.
        Open

                        if dist <= bin_width * tol:
                            outlier_scores[j, i] = out_score_i[n_bins - 1]
                        else:
                            outlier_scores[j, i] = np.min(out_score_i)
        Severity: Major
        Found in pyod/models/hbos.py and 1 other location - About 3 hrs to fix
        pyod/models/hbos.py on lines 260..263

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 63.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Similar blocks of code found in 2 locations. Consider refactoring.
        Open

                        if dist <= bin_width * tol:
                            outlier_scores[j, i] = out_score_i[optimal_n_bins - 1]
                        else:
                            outlier_scores[j, i] = np.min(out_score_i)
        Severity: Major
        Found in pyod/models/hbos.py and 1 other location - About 3 hrs to fix
        pyod/models/hbos.py on lines 346..349

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 63.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Consider simplifying this complex logical expression.
        Open

            if (include_left and include_right) and (param < low or param > high):
                raise ValueError(
                    '{param_name} is set to {param}. '
                    'Not in the range of [{low}, {high}].'.format(
                        param=param, low=low, high=high, param_name=param_name))
        Severity: Critical
        Found in pyod/utils/utility.py - About 3 hrs to fix
          Severity
          Category
          Status
          Source
          Language