src/triage/component/postmodeling/utils/add_predictions.py

Summary

Maintainability
A
1 hr
Test Coverage

Function add_predictions has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
Open

def add_predictions(db_engine, model_groups, project_path, experiment_hashes=None, train_end_times_range=None, rank_order='worst', replace=True):
    """ For a set of modl_groups generate test predictions and write to DB
        Args:
            db_engine: Sqlalchemy engine
            model_groups (list): The list of model group ids we are interested in (ideally, chosen through audition)
Severity: Minor
Found in src/triage/component/postmodeling/utils/add_predictions.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function add_predictions has 7 arguments (exceeds 5 allowed). Consider refactoring.
Open

def add_predictions(db_engine, model_groups, project_path, experiment_hashes=None, train_end_times_range=None, rank_order='worst', replace=True):
Severity: Major
Found in src/triage/component/postmodeling/utils/add_predictions.py - About 45 mins to fix

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (108 > 88 characters)
    Open

                                            This too, helps narrow down model_ids in the model groups specified.

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (111 > 88 characters)
    Open

        not_fetched_model_grps = [x for x in model_groups if not x in model_matrix_info['model_group_id'].unique()]

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (173 > 88 characters)
    Open

                experiment_hashes (List[str]): Optional. hash(es) of the experiments we are interested in. Can be used to narrow down the model_ids in the model groups specified

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (145 > 88 characters)
    Open

    def add_predictions(db_engine, model_groups, project_path, experiment_hashes=None, train_end_times_range=None, rank_order='worst', replace=True):

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (144 > 88 characters)
    Open

                                            A dictionary with two possible keys 'range_start_date' and 'range_end_date'. Either or both could be set

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (93 > 88 characters)
    Open

            logging.info('Processing {} model_ids for train matrix {} and test matrix {}'.format(

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Trailing whitespace
    Open

        """ For the given experiment and model groups, fetch the model_ids, and match them with their train/test matrix pairs 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Trailing whitespace
    Open

                range_train_end_times (Dict): Optional. If provided, only the models with train_end_times that fall in the range are scored. 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (110 > 88 characters)
    Open

                replace (bool) : Whether to overwrite the preditctions for a model_id, if already found in the DB.

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Trailing whitespace
    Open

    import pandas as pd 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Trailing whitespace
    Open

            FROM triage_metadata.experiment_models a 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (116 > 88 characters)
    Open

                model_groups (list): The list of model group ids we are interested in (ideally, chosen through audition)

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (100 > 88 characters)
    Open

                logging.info('Filtering out models with a train_end_time before {}'.format(range_start))

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Test for membership should be 'not in'
    Open

        not_fetched_model_grps = [x for x in model_groups if not x in model_matrix_info['model_group_id'].unique()]

    Negative comparison should be done using "not in" and "is not".

    Okay: if x not in y:\n    pass
    Okay: assert (X in Y or X is Z)
    Okay: if not (X in Y):\n    pass
    Okay: zz = x is not y
    E713: Z = not X in Y
    E713: if not X.B in Y:\n    pass
    E714: if not X is Y:\n    pass
    E714: Z = not X.B is Y

    Blank line contains whitespace
    Open

            

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Trailing whitespace
    Open

            SELECT 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (90 > 88 characters)
    Open

        time_cts = models_df.groupby(['train_end_time', 'model_group_id']).count()['model_id']

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (101 > 88 characters)
    Open

                A DataFrame that contains the relevant model_ids, train_end_times, and matrix information

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Trailing whitespace
    Open

                rank_order (str) : How to deal with ties in the scores. 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (97 > 88 characters)
    Open

                logging.info('Filtering out models with a train_end_time after {}'.format(range_end))

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Trailing whitespace
    Open

            Return: 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (136 > 88 characters)
    Open

                range_train_end_times (Dict): Optional. If provided, only the models with train_end_times that fall in the range are scored. 

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Trailing whitespace
    Open

        if train_end_times_range is not None: 

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (89 > 88 characters)
    Open

        # summary of the models that we are scoring. To check any special things worth noting

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (102 > 88 characters)
    Open

                experiment_hashes (List[str]): Optional. A list of experiment hashes we are interested in.

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (119 > 88 characters)
    Open

                project_path (str): Path where the created matrices and trained model objects are stored for the experiment

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Blank line contains whitespace
    Open

        

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (147 > 88 characters)
    Open

            logging.warning('There are model groups with more than one model per train_end_time. See below: \n {}'.format(time_cts[msk].reset_index()))

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Trailing whitespace
    Open

            if 'range_end_date' in train_end_times_range:       

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (115 > 88 characters)
    Open

                    If this is provided, only the model_ids that are relevant to the given experiments will be returned

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Missing whitespace around operator
    Open

        if len(model_matrix_info)==0:

    Surround operators with a single space on either side.

    - Always surround these binary operators with a single space on
      either side: assignment (=), augmented assignment (+=, -= etc.),
      comparisons (==, <, >, !=, <=, >=, in, not in, is, is not),
      Booleans (and, or, not).
    
    - If operators with different priorities are used, consider adding
      whitespace around the operators with the lowest priorities.
    
    Okay: i = i + 1
    Okay: submitted += 1
    Okay: x = x * 2 - 1
    Okay: hypot2 = x * x + y * y
    Okay: c = (a + b) * (a - b)
    Okay: foo(bar, key='word', *args, **kwargs)
    Okay: alpha[:-i]
    
    E225: i=i+1
    E225: submitted +=1
    E225: x = x /2 - 1
    E225: z = x **y
    E225: z = 1and 1
    E226: c = (a+b) * (a-b)
    E226: hypot2 = x*x + y*y
    E227: c = a|b
    E228: msg = fmt%(errno, errmsg)

    Line too long (106 > 88 characters)
    Open

            # To ensure that the column order we use for predictions match the order we used in model training

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (100 > 88 characters)
    Open

        logging.info('Successfully generated predictions for {} models!'.format(len(model_matrix_info)))

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (121 > 88 characters)
    Open

        """ For the given experiment and model groups, fetch the model_ids, and match them with their train/test matrix pairs 

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (99 > 88 characters)
    Open

            args_dict['experiment_hashes'] = ', '.join(["'" + str(x) + "'" for x in experiment_hashes])

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Blank line contains whitespace
    Open

            

    Trailing whitespace is superfluous.

    The warning returned varies on whether the line itself is blank,
    for easier filtering for those who want to indent their blank lines.
    
    Okay: spam(1)\n#
    W291: spam(1) \n#
    W293: class Foo(object):\n    \n    bang = 12

    Line too long (118 > 88 characters)
    Open

            logging.warning('No. of models is different between model groups. See below: \n {}'.format(cts.reset_index()))

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (176 > 88 characters)
    Open

            raise ValueError('The config is not valid. No models were found for the model group(s) {}. All specified model groups should be present'.format(not_fetched_model_grps))

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (94 > 88 characters)
    Open

        # Sometimes there are more than one model for each model group with different random seeds

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    Line too long (133 > 88 characters)
    Open

        # Al the model groups specified in the config file should valid (even if the experiment_hashes and train_end_times are specified)

    Limit all lines to a maximum of 79 characters.

    There are still many devices around that are limited to 80 character
    lines; plus, limiting windows to 80 characters makes it possible to
    have several windows side-by-side.  The default wrapping on such
    devices looks ugly.  Therefore, please limit all lines to a maximum
    of 79 characters. For flowing long blocks of text (docstrings or
    comments), limiting the length to 72 characters is recommended.
    
    Reports error E501.

    There are no issues that match your filters.

    Category
    Status