DeveloperCAP/MLCAT

View on GitHub

Showing 109 of 109 total issues

Identical blocks of code found in 2 locations. Consider refactoring.
Open

        if len(trunc_date) > 30 and trunc_date[14] == ':':
            datetime_obj = datetime.datetime.strptime(trunc_date, "%a, %b %d %H:%M:%S %Y %z")
        elif len(trunc_date) == 25 or len(trunc_date) == 26:
            datetime_obj = datetime.datetime.strptime(trunc_date, "%d %b %Y %H:%M:%S %z")
        elif trunc_date[3] == ',' and (len(trunc_date) == 28 or len(trunc_date) == 29) and '+' not in trunc_date and '-' not in trunc_date:
Severity: Major
Found in lib/util/read.py and 1 other location - About 2 days to fix
lib/util/read.py on lines 141..152

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 271.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

        if len(trunc_date) > 30 and trunc_date[14] == ':':
            datetime_obj = datetime.datetime.strptime(trunc_date, "%a, %b %d %H:%M:%S %Y %z")
        elif len(trunc_date) == 25 or len(trunc_date) == 26:
            datetime_obj = datetime.datetime.strptime(trunc_date, "%d %b %Y %H:%M:%S %z")
        elif trunc_date[3] == ',' and (len(trunc_date) == 28 or len(trunc_date) == 29) and '+' not in trunc_date and '-' not in trunc_date:
Severity: Major
Found in lib/util/read.py and 1 other location - About 2 days to fix
lib/util/read.py on lines 92..103

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 271.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function generate_keyword_digest has a Cognitive Complexity of 119 (exceeds 5 allowed). Consider refactoring.
Open

def generate_keyword_digest(mbox_filename, output_filename, author_uid_filename, json_filename, top_n = None, console_output=True):
    """
    From the .MBOX file, this function extracts the email content is extracted using two predefined classes
    available in the Python Standard Library: Mailbox and Message. Feature vectors are created for all the authors
    by obtaining meaningful words from the mail content, after removing the stop words, using NLTK libraries.
Severity: Minor
Found in lib/input/mbox/keyword_digest.py - About 2 days to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function conversation_refresh_times has a Cognitive Complexity of 106 (exceeds 5 allowed). Consider refactoring.
Open

def conversation_refresh_times(headers_filename, nodelist_filename, edgelist_filename, foldername, time_ubound = None, time_lbound = None, plot=False, ignore_lat = False):
    """   

    :param headers_filename: The JSON file containing the headers.
    :param nodelist_filename: The csv file containing the nodes.
Severity: Minor
Found in lib/analysis/author/time_statistics.py - About 2 days to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function vertex_clustering has a Cognitive Complexity of 83 (exceeds 5 allowed). Consider refactoring.
Open

def vertex_clustering(json_filename, nodelist_filename, edgelist_filename, foldername, time_limit=None, ignore_lat=False):
    """
    This function performs vertex clustering on the dataset passed in the parameters and saves the dendrogram resulting
    from the vertex clustering as a PDF along with the visualization of the vertex cluster itself. It is recommended to
    limit these graphs to 200 authors as the visualization becomes incompehensible beyond that.
Severity: Minor
Found in lib/analysis/author/community.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function get has a Cognitive Complexity of 73 (exceeds 5 allowed). Consider refactoring.
Open

def get(json_filename, output_filename, active_score, passive_score, write_to_file=True):
    """

    :param json_data: The JSON file containing the headers.
    :param output_filename: Stores authors' email address,score and rank.
Severity: Minor
Found in lib/analysis/author/ranking.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_wh_table_authors has a Cognitive Complexity of 66 (exceeds 5 allowed). Consider refactoring.
Open

def generate_wh_table_authors(nodelist_filename, edgelist_filename, output_filename, ignore_lat=False, time_limit=None):
    """
    This module is used to generate the author version of the width height table. The width height table for the
    authors is a representation of the number of total and new authors in a thread aggregated at a given generation.
    The table, which itself is temporarily stored in a two dimensional array, is then written into a CSV file. These
Severity: Minor
Found in lib/analysis/author/wh_table.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_hyperedge_distribution has a Cognitive Complexity of 65 (exceeds 5 allowed). Consider refactoring.
Open

def generate_hyperedge_distribution(nodelist_filename, edgelist_filename, clean_headers_filename, foldername, time_limit=None, ignore_lat=False):
    """
    Generate the distribution of hyperedges for messages in a certain time limit, stores it as hyperedge_distribution.csv based on edge frequency and generates a diagram stored in plots.

    :param nodelist_filename: The csv file containing the nodes.
Severity: Minor
Found in lib/analysis/thread/hypergraph.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_time_stats_threads has a Cognitive Complexity of 55 (exceeds 5 allowed). Consider refactoring.
Open

def generate_time_stats_threads(nodelist_filename, edgelist_filename, clean_headers_filename, foldername, time_lbound=None, time_ubound=None, plot=False):
    """
    Generates and plots statistics for inter-arrival of consecutive messages and distribution of length of each disccussion thread.

    :param nodelist_filename: The csv file containing the nodes.
Severity: Minor
Found in lib/analysis/thread/time_statistics.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

    with open("graph_edges.csv", "r") as edge_file:
        for pair in edge_file:
            edge = pair.split(';')
            edge[1] = edge[1].strip()
            try:
Severity: Major
Found in lib/analysis/thread/hypergraph.py and 1 other location - About 7 hrs to fix
lib/analysis/thread/graph/generate.py on lines 24..34

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 119.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

    with open("graph_edges.csv", "r") as edge_file:
        for pair in edge_file:
            edge = pair.split(';')
            edge[1] = edge[1].strip()
            try:
Severity: Major
Found in lib/analysis/thread/graph/generate.py and 1 other location - About 7 hrs to fix
lib/analysis/thread/hypergraph.py on lines 93..103

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 119.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function generate_wh_table_threads has a Cognitive Complexity of 45 (exceeds 5 allowed). Consider refactoring.
Open

def generate_wh_table_threads(nodelist_filename, edgelist_filename, output_filename, ignore_lat=False, time_limit=None):
    """
    
    Generate the thread width height table, which is a representation of the number of nodes in the graph that have a
    given height and a given number of children in a tabular form. This table provides an aggregate statistical view of
Severity: Minor
Found in lib/analysis/thread/wh_table.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_edge_list has a Cognitive Complexity of 42 (exceeds 5 allowed). Consider refactoring.
Open

def generate_edge_list(author_nodes, author_edges, graph_nodes,
                       graph_edges, threads_json, author_json, ignore_lat=True):
    """    
    :param author_nodes: The csv file containing the author nodes data.
    :param author_edges: The csv file containing the author edges data.
Severity: Minor
Found in lib/analysis/author/edge_list.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate has a Cognitive Complexity of 42 (exceeds 5 allowed). Consider refactoring.
Open

def generate(ignore_lat=False, time_limit=None):
    """

    This function generate a table containing the number of mails in a thread and the corresponding aggregate count
    of the number of threads that have that number of mails in them, along with the total number of authors who have
Severity: Minor
Found in lib/analysis/thread/ps_table.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function author_interaction has a Cognitive Complexity of 41 (exceeds 5 allowed). Consider refactoring.
Open

def author_interaction(clean_data, graph_nodes, graph_edges, pajek_file, ignore_lat=True):
    """
    Prints the number of strongly connected components,weekly connected components, number of nodes and edges from the author graph.

    :param clean_data: Path to clean_data.json file
Severity: Minor
Found in lib/analysis/author/graph/generate.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function weighted_multigraph has a Cognitive Complexity of 40 (exceeds 5 allowed). Consider refactoring.
Open

def weighted_multigraph(graph_nodes, graph_edges, clean_data, output_dir, ignore_lat = False):
    """

    Calls other functions to generate graphs that show the interaction between authors either through multiple edges or
    through edge weights.
Severity: Minor
Found in lib/analysis/author/graph/interaction.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function remove_invalid_references has a Cognitive Complexity of 40 (exceeds 5 allowed). Consider refactoring.
Open

def remove_invalid_references(input_json_filename, output_json_filename, ref_toggle=False):
    """
    This function is used to remove headers associated with invalid references.
    
    :param input_json_filename: The json file containing all the references.
Severity: Minor
Found in lib/input/data_cleanup.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_hyperedges has a Cognitive Complexity of 40 (exceeds 5 allowed). Consider refactoring.
Open

def generate_hyperedges():
    """

    Generates hyperedges from the discussion graph obtained from the nodes and edges stored in graph_nodes.csv and graph_edges.csv.
    All email header information can be represented as one hyperedge of a hypergraph.
Severity: Minor
Found in lib/analysis/thread/hypergraph.py - About 6 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function get_mail_header has a Cognitive Complexity of 39 (exceeds 5 allowed). Consider refactoring.
Open

def get_mail_header(to_get, range_=True, uid_map_filename='thread_uid_map.json'):
    """
    This function fetches the emails from the IMAP server as per the parameters passed.
    
    :param to_get: List of UIDs of the mails to get. Default value is 2000.
Severity: Minor
Found in lib/input/imap/header.py - About 5 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

                if edge[0] in msgs_before_time and edge[1] in msgs_before_time:
                    try:
                        discussion_graph.node[edge[0]]['sender']
                        discussion_graph.node[edge[1]]['sender']
                        discussion_graph.add_edge(*edge)
Severity: Major
Found in lib/analysis/author/wh_table.py and 1 other location - About 5 hrs to fix
lib/analysis/author/wh_table.py on lines 78..84

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 96.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Severity
Category
Status
Source
Language