pombo-lab/gamtools

View on GitHub

Showing 473 of 473 total issues

Identical blocks of code found in 2 locations. Consider refactoring.
Open

    if only_visible:
        visible = segregation_table.loc[segregation_table.sum(axis=1) > 0]
    else:
        visible = segregation_table
Severity: Minor
Found in lib/gamtools/resolution.py and 1 other location - About 55 mins to fix
lib/gamtools/resolution.py on lines 158..161

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 37.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

    if only_visible:
        visible = segregation_table.loc[segregation_table.sum(axis=1) > 0]
    else:
        visible = segregation_table
Severity: Minor
Found in lib/gamtools/resolution.py and 1 other location - About 55 mins to fix
lib/gamtools/resolution.py on lines 118..121

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 37.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

convert_parser.add_argument(
    '-o', '--output-format',
    choices=matrix.OUTPUT_FORMATS,
    help='Output matrix file format (choose from: {})'.format(
        ', '.join(matrix.OUTPUT_FORMATS.keys())))
Severity: Major
Found in lib/gamtools/main.py and 2 other locations - About 55 mins to fix
lib/gamtools/main.py on lines 161..165
lib/gamtools/main.py on lines 244..248

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 37.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function threshold_file has 7 arguments (exceeds 4 allowed). Consider refactoring.
Open

def threshold_file(input_file, output_file, #pylint: disable=too-many-arguments
Severity: Major
Found in lib/gamtools/call_windows.py - About 50 mins to fix

    Function convert has 7 arguments (exceeds 4 allowed). Consider refactoring.
    Open

    def convert(input_file, input_format, #pylint: disable=too-many-arguments
    Severity: Major
    Found in lib/gamtools/matrix.py - About 50 mins to fix

      Similar blocks of code found in 4 locations. Consider refactoring.
      Open

      def get_dprime(segregation_data, *location_strings):
          """Calculate the normalized linkage disequilibrium (D') matrix for a given
          genomic location or locations. Where only one location is given, normalized
          linkage is calculated for that region against itself.  Where two regions
          are given, linkage is calculated for region one against region two.
      Severity: Major
      Found in lib/gamtools/cosegregation.py and 3 other locations - About 50 mins to fix
      lib/gamtools/cosegregation.py on lines 222..236
      lib/gamtools/cosegregation.py on lines 275..289
      lib/gamtools/cosegregation.py on lines 364..380

      Duplicated Code

      Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

      Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

      When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

      Tuning

      This issue has a mass of 36.

      We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

      The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

      If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

      See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

      Refactorings

      Further Reading

      Similar blocks of code found in 4 locations. Consider refactoring.
      Open

      def get_cosesgregation(segregation_data, *location_strings):
          """Calculate co-segregation frequencies for a given genomic
          location or locations. Where only one location is given,
          co-segregation is calculated for that region against itself.
      
      
      Severity: Major
      Found in lib/gamtools/cosegregation.py and 3 other locations - About 50 mins to fix
      lib/gamtools/cosegregation.py on lines 275..289
      lib/gamtools/cosegregation.py on lines 319..334
      lib/gamtools/cosegregation.py on lines 364..380

      Duplicated Code

      Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

      Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

      When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

      Tuning

      This issue has a mass of 36.

      We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

      The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

      If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

      See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

      Refactorings

      Further Reading

      Similar blocks of code found in 4 locations. Consider refactoring.
      Open

      def get_npmi(segregation_data, *location_strings):
          """Calculate the normalized pointwise mutual information (npmi) matrix for a given
          genomic location or locations. Where only one location is given, npmi
          is calculated for that region against itself.  Where two regions
          are given, linkage is calculated for region one against region two.
      Severity: Major
      Found in lib/gamtools/cosegregation.py and 3 other locations - About 50 mins to fix
      lib/gamtools/cosegregation.py on lines 222..236
      lib/gamtools/cosegregation.py on lines 275..289
      lib/gamtools/cosegregation.py on lines 319..334

      Duplicated Code

      Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

      Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

      When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

      Tuning

      This issue has a mass of 36.

      We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

      The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

      If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

      See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

      Refactorings

      Further Reading

      Similar blocks of code found in 4 locations. Consider refactoring.
      Open

      def get_linkage(segregation_data, *location_strings):
          """Calculate the linkage disequilibrium matrix for a given genomic
          location or locations. Where only one location is given,
          linkage is calculated for that region against itself.
      
      
      Severity: Major
      Found in lib/gamtools/cosegregation.py and 3 other locations - About 50 mins to fix
      lib/gamtools/cosegregation.py on lines 222..236
      lib/gamtools/cosegregation.py on lines 319..334
      lib/gamtools/cosegregation.py on lines 364..380

      Duplicated Code

      Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

      Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

      When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

      Tuning

      This issue has a mass of 36.

      We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

      The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

      If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

      See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

      Refactorings

      Further Reading

      Identical blocks of code found in 2 locations. Consider refactoring.
      Open

          try:
              first_col = pd.read_csv(input_stats_files[0], delim_whitespace=True, dtype=str).iloc[:, 0]
      Severity: Minor
      Found in lib/gamtools/qc/merge.py and 1 other location - About 45 mins to fix
      lib/gamtools/qc/merge.py on lines 47..47

      Duplicated Code

      Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

      Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

      When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

      Tuning

      This issue has a mass of 35.

      We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

      The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

      If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

      See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

      Refactorings

      Further Reading

      Identical blocks of code found in 2 locations. Consider refactoring.
      Open

              first_col = pd.read_csv(input_stats_files[0], delim_whitespace=True, dtype=str).iloc[:, 0]
      Severity: Minor
      Found in lib/gamtools/qc/merge.py and 1 other location - About 45 mins to fix
      lib/gamtools/qc/merge.py on lines 43..44

      Duplicated Code

      Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

      Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

      When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

      Tuning

      This issue has a mass of 35.

      We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

      The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

      If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

      See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

      Refactorings

      Further Reading

      Function do_segregation_qc has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
      Open

      def do_segregation_qc(segregation_table, slice_thickness, nuclear_radius, coverage_q=0.2, #pylint: disable=too-many-arguments
                            skip_chroms=None, genome_size=None, plexity=1, only_visible=True):
          """
          Check that a GAM dataset has sufficient quality and depth.  Specifically,
          check that 80% of genomic windows have been detected at least 20 times
      Severity: Minor
      Found in lib/gamtools/resolution.py - About 45 mins to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function parse_module has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
      Open

      def parse_module(fastqc_module):
          """
          Parse a fastqc module from the table format to a line format (list).
          Input is list containing the module. One list-item per line. E.g.:
      
      
      Severity: Minor
      Found in lib/gamtools/qc/fastqc.py - About 45 mins to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function matrix_from_args has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
      Open

      def matrix_from_args(args):
          """Extract parameters from an argparse namespace object and pass them to
          create_and_save_contact_matrix.
          """
      
      
      Severity: Minor
      Found in lib/gamtools/cosegregation.py - About 45 mins to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function create_doit_tasks has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
      Open

          def create_doit_tasks(self):
              """Generator function that yields doit tasks."""
      
              tasks = []
              task_generators = []
      Severity: Minor
      Found in lib/gamtools/pipeline.py - About 45 mins to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function create_and_save_contact_matrix has 5 arguments (exceeds 4 allowed). Consider refactoring.
      Open

      def create_and_save_contact_matrix(segregation_file, location_strings,
      Severity: Minor
      Found in lib/gamtools/cosegregation.py - About 35 mins to fix

        Function do_enrichment has 5 arguments (exceeds 4 allowed). Consider refactoring.
        Open

        def do_enrichment(
        Severity: Minor
        Found in lib/gamtools/enrichment.py - About 35 mins to fix

          Function comparison_from_operator has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
          Open

          def comparison_from_operator(operator, left, right):
              """
              Perform a comparison between left and right values in a QC
              conditions file.
          
          
          Severity: Minor
          Found in lib/gamtools/qc/pass_qc.py - About 35 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Similar blocks of code found in 2 locations. Consider refactoring.
          Open

          def bp_coverage_path(base_folder, window_size):
              """Get the path to a bp coverage table given a base folder and resolution.
          
              :param str base_folder: Path to the folder containing the bp coverage table.
              :param int window_size: Resolution in base pairs
          Severity: Minor
          Found in lib/gamtools/pipeline.py and 1 other location - About 30 mins to fix
          lib/gamtools/pipeline.py on lines 97..108

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 32.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Similar blocks of code found in 2 locations. Consider refactoring.
          Open

          def segregation_path(base_folder, window_size):
              """Get the path to a segregation table given a base folder and resolution.
          
              :param str base_folder: Path to the folder containing the segregation table.
              :param int window_size: Resolution in base pairs
          Severity: Minor
          Found in lib/gamtools/pipeline.py and 1 other location - About 30 mins to fix
          lib/gamtools/pipeline.py on lines 70..81

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 32.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Severity
          Category
          Status
          Source
          Language