DeveloperCAP/MLCAT

View on GitHub

Showing 109 of 109 total issues

Identical blocks of code found in 2 locations. Consider refactoring.
Open

if len(trunc_date) > 30 and trunc_date[14] == ':':
datetime_obj = datetime.datetime.strptime(trunc_date, "%a, %b %d %H:%M:%S %Y %z")
elif len(trunc_date) == 25 or len(trunc_date) == 26:
datetime_obj = datetime.datetime.strptime(trunc_date, "%d %b %Y %H:%M:%S %z")
elif trunc_date[3] == ',' and (len(trunc_date) == 28 or len(trunc_date) == 29) and '+' not in trunc_date and '-' not in trunc_date:
Severity: Major
Found in lib/util/read.py and 1 other location - About 2 days to fix
lib/util/read.py on lines 141..152

Identical blocks of code found in 2 locations. Consider refactoring.
Open

if len(trunc_date) > 30 and trunc_date[14] == ':':
datetime_obj = datetime.datetime.strptime(trunc_date, "%a, %b %d %H:%M:%S %Y %z")
elif len(trunc_date) == 25 or len(trunc_date) == 26:
datetime_obj = datetime.datetime.strptime(trunc_date, "%d %b %Y %H:%M:%S %z")
elif trunc_date[3] == ',' and (len(trunc_date) == 28 or len(trunc_date) == 29) and '+' not in trunc_date and '-' not in trunc_date:
Severity: Major
Found in lib/util/read.py and 1 other location - About 2 days to fix
lib/util/read.py on lines 92..103

Function generate_keyword_digest has a Cognitive Complexity of 119 (exceeds 5 allowed). Consider refactoring.
Open

def generate_keyword_digest(mbox_filename, output_filename, author_uid_filename, json_filename, top_n = None, console_output=True):
"""
From the .MBOX file, this function extracts the email content is extracted using two predefined classes
available in the Python Standard Library: Mailbox and Message. Feature vectors are created for all the authors
by obtaining meaningful words from the mail content, after removing the stop words, using NLTK libraries.
Severity: Minor
Found in lib/input/mbox/keyword_digest.py - About 2 days to fix

Function conversation_refresh_times has a Cognitive Complexity of 106 (exceeds 5 allowed). Consider refactoring.
Open

def conversation_refresh_times(headers_filename, nodelist_filename, edgelist_filename, foldername, time_ubound = None, time_lbound = None, plot=False, ignore_lat = False):
"""
 
:param headers_filename: The JSON file containing the headers.
:param nodelist_filename: The csv file containing the nodes.
Severity: Minor
Found in lib/analysis/author/time_statistics.py - About 2 days to fix

Function vertex_clustering has a Cognitive Complexity of 83 (exceeds 5 allowed). Consider refactoring.
Open

def vertex_clustering(json_filename, nodelist_filename, edgelist_filename, foldername, time_limit=None, ignore_lat=False):
"""
This function performs vertex clustering on the dataset passed in the parameters and saves the dendrogram resulting
from the vertex clustering as a PDF along with the visualization of the vertex cluster itself. It is recommended to
limit these graphs to 200 authors as the visualization becomes incompehensible beyond that.
Severity: Minor
Found in lib/analysis/author/community.py - About 1 day to fix

Function get has a Cognitive Complexity of 73 (exceeds 5 allowed). Consider refactoring.
Open

def get(json_filename, output_filename, active_score, passive_score, write_to_file=True):
"""
 
:param json_data: The JSON file containing the headers.
:param output_filename: Stores authors' email address,score and rank.
Severity: Minor
Found in lib/analysis/author/ranking.py - About 1 day to fix

Function generate_wh_table_authors has a Cognitive Complexity of 66 (exceeds 5 allowed). Consider refactoring.
Open

def generate_wh_table_authors(nodelist_filename, edgelist_filename, output_filename, ignore_lat=False, time_limit=None):
"""
This module is used to generate the author version of the width height table. The width height table for the
authors is a representation of the number of total and new authors in a thread aggregated at a given generation.
The table, which itself is temporarily stored in a two dimensional array, is then written into a CSV file. These
Severity: Minor
Found in lib/analysis/author/wh_table.py - About 1 day to fix

Function generate_hyperedge_distribution has a Cognitive Complexity of 65 (exceeds 5 allowed). Consider refactoring.
Open

def generate_hyperedge_distribution(nodelist_filename, edgelist_filename, clean_headers_filename, foldername, time_limit=None, ignore_lat=False):
"""
Generate the distribution of hyperedges for messages in a certain time limit, stores it as hyperedge_distribution.csv based on edge frequency and generates a diagram stored in plots.
 
:param nodelist_filename: The csv file containing the nodes.
Severity: Minor
Found in lib/analysis/thread/hypergraph.py - About 1 day to fix

Function generate_time_stats_threads has a Cognitive Complexity of 55 (exceeds 5 allowed). Consider refactoring.
Open

def generate_time_stats_threads(nodelist_filename, edgelist_filename, clean_headers_filename, foldername, time_lbound=None, time_ubound=None, plot=False):
"""
Generates and plots statistics for inter-arrival of consecutive messages and distribution of length of each disccussion thread.
 
:param nodelist_filename: The csv file containing the nodes.
Severity: Minor
Found in lib/analysis/thread/time_statistics.py - About 1 day to fix

Identical blocks of code found in 2 locations. Consider refactoring.
Open

with open("graph_edges.csv", "r") as edge_file:
for pair in edge_file:
edge = pair.split(';')
edge[1] = edge[1].strip()
try:
Severity: Major
Found in lib/analysis/thread/graph/generate.py and 1 other location - About 7 hrs to fix
lib/analysis/thread/hypergraph.py on lines 93..103

Identical blocks of code found in 2 locations. Consider refactoring.
Open

with open("graph_edges.csv", "r") as edge_file:
for pair in edge_file:
edge = pair.split(';')
edge[1] = edge[1].strip()
try:
Severity: Major
Found in lib/analysis/thread/hypergraph.py and 1 other location - About 7 hrs to fix
lib/analysis/thread/graph/generate.py on lines 24..34

Function generate_wh_table_threads has a Cognitive Complexity of 45 (exceeds 5 allowed). Consider refactoring.
Open

def generate_wh_table_threads(nodelist_filename, edgelist_filename, output_filename, ignore_lat=False, time_limit=None):
"""
Generate the thread width height table, which is a representation of the number of nodes in the graph that have a
given height and a given number of children in a tabular form. This table provides an aggregate statistical view of
Severity: Minor
Found in lib/analysis/thread/wh_table.py - About 6 hrs to fix

Function generate_edge_list has a Cognitive Complexity of 42 (exceeds 5 allowed). Consider refactoring.
Open

def generate_edge_list(author_nodes, author_edges, graph_nodes,
graph_edges, threads_json, author_json, ignore_lat=True):
"""
:param author_nodes: The csv file containing the author nodes data.
:param author_edges: The csv file containing the author edges data.
Severity: Minor
Found in lib/analysis/author/edge_list.py - About 6 hrs to fix

Function generate has a Cognitive Complexity of 42 (exceeds 5 allowed). Consider refactoring.
Open

def generate(ignore_lat=False, time_limit=None):
"""
 
This function generate a table containing the number of mails in a thread and the corresponding aggregate count
of the number of threads that have that number of mails in them, along with the total number of authors who have
Severity: Minor
Found in lib/analysis/thread/ps_table.py - About 6 hrs to fix

Function author_interaction has a Cognitive Complexity of 41 (exceeds 5 allowed). Consider refactoring.
Open

def author_interaction(clean_data, graph_nodes, graph_edges, pajek_file, ignore_lat=True):
"""
Prints the number of strongly connected components,weekly connected components, number of nodes and edges from the author graph.
 
:param clean_data: Path to clean_data.json file
Severity: Minor
Found in lib/analysis/author/graph/generate.py - About 6 hrs to fix

Function weighted_multigraph has a Cognitive Complexity of 40 (exceeds 5 allowed). Consider refactoring.
Open

def weighted_multigraph(graph_nodes, graph_edges, clean_data, output_dir, ignore_lat = False):
"""
 
Calls other functions to generate graphs that show the interaction between authors either through multiple edges or
through edge weights.
Severity: Minor
Found in lib/analysis/author/graph/interaction.py - About 6 hrs to fix

Function generate_hyperedges has a Cognitive Complexity of 40 (exceeds 5 allowed). Consider refactoring.
Open

def generate_hyperedges():
"""
 
Generates hyperedges from the discussion graph obtained from the nodes and edges stored in graph_nodes.csv and graph_edges.csv.
All email header information can be represented as one hyperedge of a hypergraph.
Severity: Minor
Found in lib/analysis/thread/hypergraph.py - About 6 hrs to fix

Function remove_invalid_references has a Cognitive Complexity of 40 (exceeds 5 allowed). Consider refactoring.
Open

def remove_invalid_references(input_json_filename, output_json_filename, ref_toggle=False):
"""
This function is used to remove headers associated with invalid references.
:param input_json_filename: The json file containing all the references.
Severity: Minor
Found in lib/input/data_cleanup.py - About 6 hrs to fix

Function get_mail_header has a Cognitive Complexity of 39 (exceeds 5 allowed). Consider refactoring.
Open

def get_mail_header(to_get, range_=True, uid_map_filename='thread_uid_map.json'):
"""
This function fetches the emails from the IMAP server as per the parameters passed.
:param to_get: List of UIDs of the mails to get. Default value is 2000.
Severity: Minor
Found in lib/input/imap/header.py - About 5 hrs to fix

Identical blocks of code found in 2 locations. Consider refactoring.
Open

if edge[0] in msgs_before_time and edge[1] in msgs_before_time:
try:
discussion_graph.node[edge[0]]['sender']
discussion_graph.node[edge[1]]['sender']
discussion_graph.add_edge(*edge)
Severity: Major
Found in lib/analysis/author/wh_table.py and 1 other location - About 5 hrs to fix
lib/analysis/author/wh_table.py on lines 78..84
Severity
Category
Status
Source
Language