tensorflow/models

View on GitHub
official/nlp/data/squad_lib.py

Summary

Maintainability
F
3 wks
Test Coverage

Function postprocess_output has a Cognitive Complexity of 105 (exceeds 5 allowed). Consider refactoring.
Open

def postprocess_output(all_examples,
                       all_features,
                       all_results,
                       n_best_size,
                       max_answer_length,
Severity: Minor
Found in official/nlp/data/squad_lib.py - About 2 days to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File squad_lib.py has 733 lines of code (exceeds 250 allowed). Consider refactoring.
Open

# Copyright 2024 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Severity: Major
Found in official/nlp/data/squad_lib.py - About 1 day to fix

    Function convert_examples_to_features has a Cognitive Complexity of 78 (exceeds 5 allowed). Consider refactoring.
    Open

    def convert_examples_to_features(examples,
                                     tokenizer,
                                     max_seq_length,
                                     doc_stride,
                                     max_query_length,
    Severity: Minor
    Found in official/nlp/data/squad_lib.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function read_squad_examples has a Cognitive Complexity of 53 (exceeds 5 allowed). Consider refactoring.
    Open

    def read_squad_examples(input_file, is_training,
                            version_2_with_negative,
                            translated_input_folder=None):
      """Read a SQuAD json file into a list of SquadExample."""
      with tf.io.gfile.GFile(input_file, "r") as reader:
    Severity: Minor
    Found in official/nlp/data/squad_lib.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function get_final_text has a Cognitive Complexity of 24 (exceeds 5 allowed). Consider refactoring.
    Open

    def get_final_text(pred_text, orig_text, do_lower_case, verbose=False):
      """Project the tokenized prediction back to the original text."""
    
      # When we created the data, we kept track of the alignment between original
      # (whitespace tokenized) tokens and our WordPiece tokenized tokens. So
    Severity: Minor
    Found in official/nlp/data/squad_lib.py - About 3 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function _get_best_indexes_and_logits has a Cognitive Complexity of 19 (exceeds 5 allowed). Consider refactoring.
    Open

    def _get_best_indexes_and_logits(result,
                                     n_best_size,
                                     xlnet_format=False):
      """Generates the n-best indexes and logits from a list."""
      if xlnet_format:
    Severity: Minor
    Found in official/nlp/data/squad_lib.py - About 2 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function __init__ has 14 arguments (exceeds 4 allowed). Consider refactoring.
    Open

      def __init__(self,
    Severity: Major
    Found in official/nlp/data/squad_lib.py - About 1 hr to fix

      Function write_predictions has 12 arguments (exceeds 4 allowed). Consider refactoring.
      Open

      def write_predictions(all_examples,
      Severity: Major
      Found in official/nlp/data/squad_lib.py - About 1 hr to fix

        Function postprocess_output has 10 arguments (exceeds 4 allowed). Consider refactoring.
        Open

        def postprocess_output(all_examples,
        Severity: Major
        Found in official/nlp/data/squad_lib.py - About 1 hr to fix

          Function generate_tf_record_from_json_file has 10 arguments (exceeds 4 allowed). Consider refactoring.
          Open

          def generate_tf_record_from_json_file(input_file_path,
          Severity: Major
          Found in official/nlp/data/squad_lib.py - About 1 hr to fix

            Function convert_examples_to_features has 9 arguments (exceeds 4 allowed). Consider refactoring.
            Open

            def convert_examples_to_features(examples,
            Severity: Major
            Found in official/nlp/data/squad_lib.py - About 1 hr to fix

              Function __init__ has 7 arguments (exceeds 4 allowed). Consider refactoring.
              Open

                def __init__(self,
              Severity: Major
              Found in official/nlp/data/squad_lib.py - About 50 mins to fix

                Avoid deeply nested control flow statements.
                Open

                          if (len(qa["answers"]) != 1) and (not is_impossible):
                            raise ValueError(
                                "For training, each question should have exactly 1 answer.")
                          if not is_impossible:
                Severity: Major
                Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                  Avoid deeply nested control flow statements.
                  Open

                            if prev_is_whitespace:
                              doc_tokens.append(c)
                            else:
                              doc_tokens[-1] += c
                            prev_is_whitespace = False
                  Severity: Major
                  Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                    Avoid deeply nested control flow statements.
                    Open

                              if span_contains_answer:
                                answer_text = " ".join(tokens[start_position:(end_position + 1)])
                                logging.info("start_position: %d", (start_position))
                                logging.info("end_position: %d", (end_position))
                                logging.info("answer: %s", tokenization.printable_text(answer_text))
                    Severity: Major
                    Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                      Function _check_is_max_context has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
                      Open

                      def _check_is_max_context(doc_spans, cur_span_index, position):
                        """Check if this is the 'max context' doc span for the token."""
                      
                        # Because of the sliding window approach taken to scoring documents, a single
                        # token can appear in multiple documents. E.g.
                      Severity: Minor
                      Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                      Cognitive Complexity

                      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                      A method's cognitive complexity is based on a few simple rules:

                      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                      • Code is considered more complex for each "break in the linear flow of the code"
                      • Code is considered more complex when "flow breaking structures are nested"

                      Further reading

                      Avoid deeply nested control flow statements.
                      Open

                                if score_diff > null_score_diff_threshold:
                                  all_predictions[example.qas_id] = ""
                                else:
                                  all_predictions[example.qas_id] = best_non_null_entry.text
                            else:
                      Severity: Major
                      Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                        Avoid deeply nested control flow statements.
                        Open

                                  if version_2_with_negative:
                                    is_impossible = qa["is_impossible"]
                                  if (len(qa["answers"]) != 1) and (not is_impossible):
                        Severity: Major
                        Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                          Avoid deeply nested control flow statements.
                          Open

                                    if not is_impossible:
                                      answer = qa["answers"][0]
                                      orig_answer_text = answer["text"]
                                      answer_offset = answer["answer_start"]
                                      answer_length = len(orig_answer_text)
                          Severity: Major
                          Found in official/nlp/data/squad_lib.py - About 45 mins to fix

                            Function _improve_answer_span has 5 arguments (exceeds 4 allowed). Consider refactoring.
                            Open

                            def _improve_answer_span(doc_tokens, input_start, input_end, tokenizer,
                            Severity: Minor
                            Found in official/nlp/data/squad_lib.py - About 35 mins to fix

                              Function _compute_softmax has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
                              Open

                              def _compute_softmax(scores):
                                """Compute softmax probability over raw logits."""
                                if not scores:
                                  return []
                              
                              
                              Severity: Minor
                              Found in official/nlp/data/squad_lib.py - About 35 mins to fix

                              Cognitive Complexity

                              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                              A method's cognitive complexity is based on a few simple rules:

                              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                              • Code is considered more complex for each "break in the linear flow of the code"
                              • Code is considered more complex when "flow breaking structures are nested"

                              Further reading

                              Avoid too many return statements within this function.
                              Open

                                return output_text
                              Severity: Major
                              Found in official/nlp/data/squad_lib.py - About 30 mins to fix

                                Function _improve_answer_span has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.
                                Open

                                def _improve_answer_span(doc_tokens, input_start, input_end, tokenizer,
                                                         orig_answer_text):
                                  """Returns tokenized answer spans that better match the annotated answer."""
                                
                                  # The SQuAD annotations are character based. We first project them to
                                Severity: Minor
                                Found in official/nlp/data/squad_lib.py - About 25 mins to fix

                                Cognitive Complexity

                                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                                A method's cognitive complexity is based on a few simple rules:

                                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                                • Code is considered more complex for each "break in the linear flow of the code"
                                • Code is considered more complex when "flow breaking structures are nested"

                                Further reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                class FeatureWriter(object):
                                  """Writes InputFeature to TF example file."""
                                
                                  def __init__(self, filename, is_training):
                                    self.filename = filename
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 3 days to fix
                                official/nlp/data/squad_lib_sp.py on lines 891..932

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 400.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                def _get_best_indexes_and_logits(result,
                                                                 n_best_size,
                                                                 xlnet_format=False):
                                  """Generates the n-best indexes and logits from a list."""
                                  if xlnet_format:
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 2 days to fix
                                official/nlp/data/squad_lib_sp.py on lines 843..865

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 273.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 3 locations. Consider refactoring.
                                Open

                                def _check_is_max_context(doc_spans, cur_span_index, position):
                                  """Check if this is the 'max context' doc span for the token."""
                                
                                  # Because of the sliding window approach taken to scoring documents, a single
                                  # token can appear in multiple documents. E.g.
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 1 day to fix
                                official/legacy/xlnet/squad_utils.py on lines 845..879
                                official/nlp/data/squad_lib_sp.py on lines 579..613

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 167.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 3 locations. Consider refactoring.
                                Open

                                def _compute_softmax(scores):
                                  """Compute softmax probability over raw logits."""
                                  if not scores:
                                    return []
                                
                                
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 1 day to fix
                                official/legacy/xlnet/squad_utils.py on lines 223..243
                                official/nlp/data/squad_lib_sp.py on lines 868..888

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 135.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                def write_predictions(all_examples,
                                                      all_features,
                                                      all_results,
                                                      n_best_size,
                                                      max_answer_length,
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 7 hrs to fix
                                official/nlp/data/squad_lib_sp.py on lines 616..647

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 111.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                      if best_non_null_entry is not None:
                                        if xlnet_format:
                                          score_diff = score_null
                                          scores_diff_json[example.qas_id] = score_diff
                                          all_predictions[example.qas_id] = best_non_null_entry.text
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 7 hrs to fix
                                official/nlp/data/squad_lib_sp.py on lines 819..831

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 111.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 3 locations. Consider refactoring.
                                Open

                                    while start_offset < len(all_doc_tokens):
                                      length = len(all_doc_tokens) - start_offset
                                      if length > max_tokens_for_doc:
                                        length = max_tokens_for_doc
                                      doc_spans.append(_DocSpan(start=start_offset, length=length))
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 5 hrs to fix
                                official/legacy/xlnet/squad_utils.py on lines 674..681
                                official/nlp/data/squad_lib_sp.py on lines 372..379

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 95.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                      if version_2_with_negative:
                                        if xlnet_format:
                                          feature_null_score = result.class_logits
                                        else:
                                          feature_null_score = result.start_logits[0] + result.end_logits[0]
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 5 hrs to fix
                                official/nlp/data/squad_lib_sp.py on lines 695..704

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 89.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 3 locations. Consider refactoring.
                                Open

                                    for (i, entry) in enumerate(nbest):
                                      output = collections.OrderedDict()
                                      output["text"] = entry.text
                                      output["probability"] = probs[i]
                                      output["start_logit"] = entry.start_logit
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 5 hrs to fix
                                official/legacy/xlnet/squad_utils.py on lines 397..403
                                official/nlp/data/squad_lib_sp.py on lines 805..811

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 87.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                  if translated_input_folder is not None:
                                    translated_files = tf.io.gfile.glob(
                                        os.path.join(translated_input_folder, "*.json"))
                                    for file in translated_files:
                                      with tf.io.gfile.GFile(file, "r") as reader:
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 4 hrs to fix
                                official/nlp/data/squad_lib_sp.py on lines 121..126

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 78.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 2 locations. Consider refactoring.
                                Open

                                  def __init__(self,
                                               qas_id,
                                               question_text,
                                               doc_tokens,
                                               orig_answer_text=None,
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 3 hrs to fix
                                official/nlp/data/squad_lib_sp.py on lines 40..54

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 67.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                      while len(input_ids) < max_seq_length:
                                        input_ids.append(0)
                                        input_mask.append(0)
                                        segment_ids.append(seg_pad)
                                        paragraph_mask.append(0)
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 2 hrs to fix
                                official/nlp/data/squad_lib_sp.py on lines 449..453

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 50.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                def write_to_json_files(json_records, json_file):
                                  with tf.io.gfile.GFile(json_file, "w") as writer:
                                    writer.write(json.dumps(json_records, indent=4) + "\n")
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 1 hr to fix
                                official/nlp/data/squad_lib_sp.py on lines 838..840

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 49.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                    if version_2_with_negative and not xlnet_format:
                                      if "" not in seen_predictions:
                                        nbest.append(
                                            _NbestPrediction(
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 1 hr to fix
                                official/nlp/data/squad_lib_sp.py on lines 780..783

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 44.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                    for entry in nbest:
                                      total_scores.append(entry.start_logit + entry.end_logit)
                                      if not best_non_null_entry:
                                        if entry.text:
                                          best_non_null_entry = entry
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 1 hr to fix
                                official/nlp/data/squad_lib_sp.py on lines 796..800

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 44.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 3 locations. Consider refactoring.
                                Open

                                        logging.info(
                                            "token_is_max_context: %s", " ".join([
                                                "%d:%s" % (x, y)
                                                for (x, y) in six.iteritems(token_is_max_context)
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 1 hr to fix
                                official/legacy/xlnet/squad_utils.py on lines 783..786
                                official/nlp/data/squad_lib.py on lines 411..413

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 42.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 3 locations. Consider refactoring.
                                Open

                                        logging.info(
                                            "token_to_orig_map: %s", " ".join([
                                                "%d:%d" % (x, y) for (x, y) in six.iteritems(token_to_orig_map)
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 1 hr to fix
                                official/legacy/xlnet/squad_utils.py on lines 783..786
                                official/nlp/data/squad_lib.py on lines 415..418

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 42.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 2 locations. Consider refactoring.
                                Open

                                  if end_position in tok_s_to_ns_map:
                                    ns_end_position = tok_s_to_ns_map[end_position]
                                    if ns_end_position in orig_ns_to_s_map:
                                      orig_end_position = orig_ns_to_s_map[ns_end_position]
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 1 hr to fix
                                official/nlp/data/squad_lib.py on lines 863..866

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 42.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 2 locations. Consider refactoring.
                                Open

                                  if start_position in tok_s_to_ns_map:
                                    ns_start_position = tok_s_to_ns_map[start_position]
                                    if ns_start_position in orig_ns_to_s_map:
                                      orig_start_position = orig_ns_to_s_map[ns_start_position]
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 1 hr to fix
                                official/nlp/data/squad_lib.py on lines 874..877

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 42.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 2 locations. Consider refactoring.
                                Open

                                    if version_2_with_negative and not xlnet_format:
                                      prelim_predictions.append(
                                          _PrelimPrediction(
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 1 hr to fix
                                official/nlp/data/squad_lib_sp.py on lines 735..737

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 40.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 3 locations. Consider refactoring.
                                Open

                                          if (len(qa["answers"]) != 1) and (not is_impossible):
                                            raise ValueError(
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 50 mins to fix
                                official/legacy/xlnet/squad_utils.py on lines 453..454
                                official/nlp/data/squad_lib_sp.py on lines 142..143

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 36.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                    for _ in range(num_padding):
                                      dummy_feature.unique_id = unique_id
                                
                                      # Run callback
                                      output_fn(feature, is_padding=True)
                                Severity: Minor
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 45 mins to fix
                                official/nlp/data/squad_lib_sp.py on lines 567..572

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 35.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 3 locations. Consider refactoring.
                                Open

                                  with tf.io.gfile.GFile(input_file, "r") as reader:
                                    input_data = json.load(reader)["data"]
                                Severity: Major
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 40 mins to fix
                                official/legacy/xlnet/squad_utils.py on lines 436..437
                                official/nlp/data/squad_lib_sp.py on lines 118..119

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 34.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                    if unique_id % batch_size != 0:
                                      num_padding = batch_size - (num_examples % batch_size)
                                Severity: Minor
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 35 mins to fix
                                official/nlp/data/squad_lib_sp.py on lines 564..565

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 33.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Identical blocks of code found in 2 locations. Consider refactoring.
                                Open

                                        for token in query_tokens:
                                          tokens.append(token)
                                          segment_ids.append(seg_q)
                                          paragraph_mask.append(0)
                                Severity: Minor
                                Found in official/nlp/data/squad_lib.py and 1 other location - About 35 mins to fix
                                official/nlp/data/squad_lib_sp.py on lines 396..399

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 33.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                Similar blocks of code found in 3 locations. Consider refactoring.
                                Open

                                    prelim_predictions = sorted(
                                        prelim_predictions,
                                        key=lambda x: (x.start_logit + x.end_logit),
                                Severity: Minor
                                Found in official/nlp/data/squad_lib.py and 2 other locations - About 30 mins to fix
                                official/legacy/xlnet/squad_utils.py on lines 350..352
                                official/nlp/data/squad_lib_sp.py on lines 743..745

                                Duplicated Code

                                Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                Tuning

                                This issue has a mass of 32.

                                We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                Refactorings

                                Further Reading

                                There are no issues that match your filters.

                                Category
                                Status