KarrLab/datanator

View on GitHub

Showing 791 of 791 total issues

Similar blocks of code found in 2 locations. Consider refactoring.
Open

        for i, y_doc in enumerate(y_docs[start:]):
            if self.verbose and i % 50 == 0:
                print('Processing ymdb doc {} out of {}'.format(i+start, count_y))
            if y_doc['synonyms']:
                synonyms = y_doc['synonyms']['synonym']
datanator/data_source/metabolite_concentration/doi_10_1038_nchembio_2077.py on lines 153..165

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 181.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

        for i, e_doc in enumerate(e_docs[start:]):
            if self.verbose and i % 50 == 0:
                print('Processing ecmdb doc {} out of {}'.format(i+start, count_e))
            if e_doc['synonyms']:
                synonyms = e_doc['synonyms']['synonym']
datanator/data_source/metabolite_concentration/doi_10_1038_nchembio_2077.py on lines 167..179

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 181.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function run has a Cognitive Complexity of 77 (exceeds 5 allowed). Consider refactoring.
Open

    def run(self, in_obo_filename=None, in_tsv_filename=None, in_monomers_filename=None, max_num_proteins=None,
            out_tsv_filename=None):
        """ Download PRO ontology, generate proteoforms, and encode with BpForms

        Args:
Severity: Minor
Found in datanator/data_source/protein_modification/pro.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File sabio_rk_json_mongo.py has 652 lines of code (exceeds 250 allowed). Consider refactoring.
Open

'''Parse SabioRk json files into MongoDB documents
    (json files acquired by running sqlite_to_json.py)
:Author: Zhouyang Lian <zhouyang.lian@familian.life>
:Author: Jonathan <jonrkarr@gmail.com>
:Date: 2019-04-02
Severity: Major
Found in datanator/data_source/sabio_rk_json_mongo.py - About 1 day to fix

    Function parse_protein has a Cognitive Complexity of 69 (exceeds 5 allowed). Consider refactoring.
    Open

        def parse_protein(self, protein):
            """ Parse the modification information from a term for a modified protein
    
            Args:
                protein (:obj:`dict`): term for a modified protein
    Severity: Minor
    Found in datanator/data_source/protein_modification/pro.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function build_rna_observation has a Cognitive Complexity of 67 (exceeds 5 allowed). Consider refactoring.
    Open

        def build_rna_observation(self, obj):
            """Build RNA observation object from rna_haflife_new collection.
    
            Args:
                obj(:obj:`Obj`): object to be transformed.
    Severity: Minor
    Found in datanator/schema_2/transform.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File pro.py has 605 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    """ Generate BpForms for all of the proteins in PRO, verify
    them, and calculate their properties
    
    :Author: Jonathan Karr <karr@mssm.edu>
    :Date: 2019-06-24
    Severity: Major
    Found in datanator/data_source/protein_modification/pro.py - About 1 day to fix

      File __main__.py has 593 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      """ Command line utilities
      
      :Author: Yosef Roth <yosefdroth@gmail.com>
      :Author: Jonathan Karr <jonrkarr@gmail.com>
      :Author: Saahith Pochiraju <saahith116@gmail.com>
      Severity: Major
      Found in datanator/__main__.py - About 1 day to fix

        Similar blocks of code found in 3 locations. Consider refactoring.
        Open

            def download_ko(self, name):
                address = name.split('.')[0]
                try:
                    info = requests.get("http://rest.kegg.jp/get/ko:{}".format(address))
                    info.raise_for_status()
        Severity: Major
        Found in datanator/data_source/kegg_orthology.py and 2 other locations - About 1 day to fix
        datanator/data_source/kegg_reaction_class.py on lines 158..170
        datanator/data_source/kegg_reaction_class.py on lines 172..184

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 146.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Similar blocks of code found in 3 locations. Consider refactoring.
        Open

            def download_rxn_cls(self, cls):
                address = cls.split('.')[0]
                try:
                    info = requests.get("http://rest.kegg.jp/get/rclass:{}".format(address))
                    info.raise_for_status()
        Severity: Major
        Found in datanator/data_source/kegg_reaction_class.py and 2 other locations - About 1 day to fix
        datanator/data_source/kegg_orthology.py on lines 257..269
        datanator/data_source/kegg_reaction_class.py on lines 158..170

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 146.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Similar blocks of code found in 2 locations. Consider refactoring.
        Open

                    if ob_e != {}:
                        query = {"$and": [{"identifier": {'namespace': 'inchikey', 'value': doc["InChI_Key"]}},
                                        {"source": {"$elemMatch": ob_e["source"][0]}}]}
                        self.obs_col.update_one(query,
                                                {"$set": {"genotype": ob_e["genotype"],
        Severity: Major
        Found in datanator/schema_2/transform_metabolites_meta.py and 1 other location - About 1 day to fix
        datanator/schema_2/transform_metabolites_meta.py on lines 42..51

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 146.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Similar blocks of code found in 2 locations. Consider refactoring.
        Open

                    if ob_y != {}:
                        query = {"$and": [{"identifier": {'namespace': 'inchikey', 'value': doc["InChI_Key"]}},
                                        {"source": {"$elemMatch": ob_y["source"][0]}}]}
                        self.obs_col.update_one(query,
                                                {"$set": {"genotype": ob_y["genotype"],
        Severity: Major
        Found in datanator/schema_2/transform_metabolites_meta.py and 1 other location - About 1 day to fix
        datanator/schema_2/transform_metabolites_meta.py on lines 53..62

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 146.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Similar blocks of code found in 3 locations. Consider refactoring.
        Open

            def download_rxn(self, name):
                address = name.split('.')[0]
                try:
                    info = requests.get("http://rest.kegg.jp/get/reaction:{}".format(address))
                    info.raise_for_status()
        Severity: Major
        Found in datanator/data_source/kegg_reaction_class.py and 2 other locations - About 1 day to fix
        datanator/data_source/kegg_orthology.py on lines 257..269
        datanator/data_source/kegg_reaction_class.py on lines 172..184

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 146.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Function get_strain_info has a Cognitive Complexity of 62 (exceeds 5 allowed). Consider refactoring.
        Open

        def get_strain_info(sample):
            """
            Get information about the refernce genome that should be used for a given sample
        
                Args:
        Severity: Minor
        Found in datanator/data_source/array_express_tools/ensembl_tools.py - About 1 day to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Identical blocks of code found in 2 locations. Consider refactoring.
        Open

                        if data.iloc[i,4]!="n.d.":
                            values_p.append({"type": "Half-life",
                                             "value": str(float(data.iloc[i,4])*60),
                                             "units": "s"})
                        else:
        datanator/data_source/protein_half_lives/victoria_parse_yeast_global_proteome_turnover.py on lines 106..114

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 139.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Identical blocks of code found in 2 locations. Consider refactoring.
        Open

                if data.iloc[i,4]!="n.d.":
                    values_p.append({"type": "Half-life",
                                     "value": str(float(data.iloc[i,4])*60),
                                     "units": "s"})
                else:
        datanator/data_source/protein_half_lives/victoria_parse_yeast_global_proteome_turnover.py on lines 190..198

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 139.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Function normalize_parameter_value has a Cognitive Complexity of 57 (exceeds 5 allowed). Consider refactoring.
        Open

            def normalize_parameter_value(self, name, type, value, error, units, enzyme_molecular_weight):
                """
                Args:
                    name (:obj:`str`): parameter name
                    type (:obj:`int`) parameter type (SBO term id)
        Severity: Minor
        Found in datanator/data_source/sabio_rk.py - About 1 day to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function normalize_parameter_value has a Cognitive Complexity of 56 (exceeds 5 allowed). Consider refactoring.
        Open

            def normalize_parameter_value(self, name, type, value, error, units, enzyme_molecular_weight):
                """
                Args:
                    name (:obj:`str`): parameter name
                    type (:obj:`int`) parameter type (SBO term id)
        Severity: Minor
        Found in datanator/data_source/sabio_rk_nosql.py - About 1 day to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function load_compounds has a Cognitive Complexity of 56 (exceeds 5 allowed). Consider refactoring.
        Open

            def load_compounds(self, compounds=None):
                """ Download information from SABIO-RK about all of the compounds stored in the local sqlite copy of SABIO-RK
        
                Args:
                    compounds (:obj:`list` of :obj:`Compound`): list of compounds to download
        Severity: Minor
        Found in datanator/data_source/sabio_rk.py - About 1 day to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Similar blocks of code found in 2 locations. Consider refactoring.
        Open

                    elif (gene_name is None and protein_name is not None and 
                          protein_name != 'Uncharacterized protein'): # record exists in uniprot collection with non-filler protein_name
                        self.rna_hl_collection.update_one({'protein_name': protein_name},
                                                          {'$set': {'modified': datetime.datetime.utcnow(),
                                                                    'gene_name': gene_name},
        datanator/data_source/rna_halflife/doi_10_1186_gb_2012_13_4_r30.py on lines 109..130

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 124.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Severity
        Category
        Status
        Source
        Language