tensorflow/models

View on GitHub
official/nlp/data/squad_lib_sp.py

Summary

Maintainability
F
1 mo
Test Coverage

Function convert_examples_to_features has a Cognitive Complexity of 147 (exceeds 5 allowed). Consider refactoring.
Open

def convert_examples_to_features(examples,
                                 tokenizer,
                                 max_seq_length,
                                 doc_stride,
                                 max_query_length,
Severity: Minor
Found in official/nlp/data/squad_lib_sp.py - About 2 days to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function postprocess_output has a Cognitive Complexity of 94 (exceeds 5 allowed). Consider refactoring.
Open

def postprocess_output(all_examples,
                       all_features,
                       all_results,
                       n_best_size,
                       max_answer_length,
Severity: Minor
Found in official/nlp/data/squad_lib_sp.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File squad_lib_sp.py has 773 lines of code (exceeds 250 allowed). Consider refactoring.
Open

# Copyright 2024 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Severity: Major
Found in official/nlp/data/squad_lib_sp.py - About 1 day to fix

    Function _convert_index has a Cognitive Complexity of 28 (exceeds 5 allowed). Consider refactoring.
    Open

    def _convert_index(index, pos, m=None, is_start=True):
      """Converts index."""
      if index[pos] is not None:
        return index[pos]
      n = len(index)
    Severity: Minor
    Found in official/nlp/data/squad_lib_sp.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function read_squad_examples has a Cognitive Complexity of 25 (exceeds 5 allowed). Consider refactoring.
    Open

    def read_squad_examples(input_file,
                            is_training,
                            version_2_with_negative,
                            translated_input_folder=None):
      """Read a SQuAD json file into a list of SquadExample."""
    Severity: Minor
    Found in official/nlp/data/squad_lib_sp.py - About 3 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function _get_best_indexes_and_logits has a Cognitive Complexity of 19 (exceeds 5 allowed). Consider refactoring.
    Open

    def _get_best_indexes_and_logits(result,
                                     n_best_size,
                                     xlnet_format=False):
      """Generates the n-best indexes and logits from a list."""
      if xlnet_format:
    Severity: Minor
    Found in official/nlp/data/squad_lib_sp.py - About 2 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function __init__ has 16 arguments (exceeds 4 allowed). Consider refactoring.
    Open

      def __init__(self,
    Severity: Major
    Found in official/nlp/data/squad_lib_sp.py - About 2 hrs to fix

      Function write_predictions has 12 arguments (exceeds 4 allowed). Consider refactoring.
      Open

      def write_predictions(all_examples,
      Severity: Major
      Found in official/nlp/data/squad_lib_sp.py - About 1 hr to fix

        Function convert_examples_to_features has 10 arguments (exceeds 4 allowed). Consider refactoring.
        Open

        def convert_examples_to_features(examples,
        Severity: Major
        Found in official/nlp/data/squad_lib_sp.py - About 1 hr to fix

          Function generate_tf_record_from_json_file has 10 arguments (exceeds 4 allowed). Consider refactoring.
          Open

          def generate_tf_record_from_json_file(input_file_path,
          Severity: Major
          Found in official/nlp/data/squad_lib_sp.py - About 1 hr to fix

            Function postprocess_output has 10 arguments (exceeds 4 allowed). Consider refactoring.
            Open

            def postprocess_output(all_examples,
            Severity: Major
            Found in official/nlp/data/squad_lib_sp.py - About 1 hr to fix

              Function __init__ has 7 arguments (exceeds 4 allowed). Consider refactoring.
              Open

                def __init__(self,
              Severity: Major
              Found in official/nlp/data/squad_lib_sp.py - About 50 mins to fix

                Avoid deeply nested control flow statements.
                Open

                          if (len(qa["answers"]) != 1) and (not is_impossible):
                            raise ValueError(
                                "For training, each question should have exactly 1 answer.")
                          if not is_impossible:
                Severity: Major
                Found in official/nlp/data/squad_lib_sp.py - About 45 mins to fix

                  Function _check_is_max_context has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
                  Open

                  def _check_is_max_context(doc_spans, cur_span_index, position):
                    """Check if this is the 'max context' doc span for the token."""
                  
                    # Because of the sliding window approach taken to scoring documents, a single
                    # token can appear in multiple documents. E.g.
                  Severity: Minor
                  Found in official/nlp/data/squad_lib_sp.py - About 45 mins to fix

                  Cognitive Complexity

                  Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                  A method's cognitive complexity is based on a few simple rules:

                  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                  • Code is considered more complex for each "break in the linear flow of the code"
                  • Code is considered more complex when "flow breaking structures are nested"

                  Further reading

                  Avoid deeply nested control flow statements.
                  Open

                            if not is_impossible:
                              answer = qa["answers"][0]
                              orig_answer_text = answer["text"]
                              start_position = answer["answer_start"]
                            else:
                  Severity: Major
                  Found in official/nlp/data/squad_lib_sp.py - About 45 mins to fix

                    Function _compute_softmax has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
                    Open

                    def _compute_softmax(scores):
                      """Compute softmax probability over raw logits."""
                      if not scores:
                        return []
                    
                    
                    Severity: Minor
                    Found in official/nlp/data/squad_lib_sp.py - About 35 mins to fix

                    Cognitive Complexity

                    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                    A method's cognitive complexity is based on a few simple rules:

                    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                    • Code is considered more complex for each "break in the linear flow of the code"
                    • Code is considered more complex when "flow breaking structures are nested"

                    Further reading

                    Avoid too many return statements within this function.
                    Open

                            return index[front] + 1
                    Severity: Major
                    Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                      Avoid too many return statements within this function.
                      Open

                              return m - 1
                      Severity: Major
                      Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                        Avoid too many return statements within this function.
                        Open

                              return index[rear]
                        Severity: Major
                        Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                          Avoid too many return statements within this function.
                          Open

                                return index[front]
                          Severity: Major
                          Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                            Avoid too many return statements within this function.
                            Open

                                  return index[front] + 1
                            Severity: Major
                            Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                              Avoid too many return statements within this function.
                              Open

                                    return index[rear] - 1
                              Severity: Major
                              Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                                Avoid too many return statements within this function.
                                Open

                                    return index[front]
                                Severity: Major
                                Found in official/nlp/data/squad_lib_sp.py - About 30 mins to fix

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                  def _convert_index(index, pos, m=None, is_start=True):
                                    """Converts index."""
                                    if index[pos] is not None:
                                      return index[pos]
                                    n = len(index)
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 4 days to fix
                                  official/legacy/xlnet/squad_utils.py on lines 477..512

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 427.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                  class FeatureWriter(object):
                                    """Writes InputFeature to TF example file."""
                                  
                                    def __init__(self, filename, is_training):
                                      self.filename = filename
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 3 days to fix
                                  official/nlp/data/squad_lib.py on lines 116..158

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 400.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      def _lcs_match(max_dist, n=n, m=m):
                                        """Longest-common-substring algorithm."""
                                        f.fill(0)
                                        g.clear()
                                  
                                  
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 3 days to fix
                                  official/legacy/xlnet/squad_utils.py on lines 565..595

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 360.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                  def _get_best_indexes_and_logits(result,
                                                                   n_best_size,
                                                                   xlnet_format=False):
                                    """Generates the n-best indexes and logits from a list."""
                                    if xlnet_format:
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 2 days to fix
                                  official/nlp/data/squad_lib.py on lines 888..910

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 273.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                  def _check_is_max_context(doc_spans, cur_span_index, position):
                                    """Check if this is the 'max context' doc span for the token."""
                                  
                                    # Because of the sliding window approach taken to scoring documents, a single
                                    # token can appear in multiple documents. E.g.
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 1 day to fix
                                  official/legacy/xlnet/squad_utils.py on lines 845..879
                                  official/nlp/data/squad_lib.py on lines 514..548

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 167.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      while i >= 0 and j >= 0:
                                        if (i, j) not in g:
                                          break
                                        if g[(i, j)] == 2:
                                          orig_to_chartok_index[i] = j
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 1 day to fix
                                  official/legacy/xlnet/squad_utils.py on lines 607..617

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 142.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                  def _compute_softmax(scores):
                                    """Compute softmax probability over raw logits."""
                                    if not scores:
                                      return []
                                  
                                  
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 1 day to fix
                                  official/legacy/xlnet/squad_utils.py on lines 223..243
                                  official/nlp/data/squad_lib.py on lines 913..933

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 135.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                        if xlnet_format:
                                          score_diff = score_null
                                          scores_diff_json[example.qas_id] = score_diff
                                          all_predictions[example.qas_id] = best_non_null_entry.text
                                        else:
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 7 hrs to fix
                                  official/nlp/data/squad_lib.py on lines 764..776

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 111.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                  def write_predictions(all_examples,
                                                        all_features,
                                                        all_results,
                                                        n_best_size,
                                                        max_answer_length,
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 7 hrs to fix
                                  official/nlp/data/squad_lib.py on lines 551..582

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 111.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      if is_training and not example.is_impossible:
                                        start_position = example.start_position
                                        end_position = start_position + len(example.orig_answer_text) - 1
                                  
                                        start_chartok_pos = _convert_index(
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 6 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 644..655

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 104.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                      while start_offset < len(all_doc_tokens):
                                        length = len(all_doc_tokens) - start_offset
                                        if length > max_tokens_for_doc:
                                          length = max_tokens_for_doc
                                        doc_spans.append(_DocSpan(start=start_offset, length=length))
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 5 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 674..681
                                  official/nlp/data/squad_lib.py on lines 307..314

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 95.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                        if version_2_with_negative:
                                          if xlnet_format:
                                            feature_null_score = result.class_logits
                                          else:
                                            feature_null_score = result.start_logits[0] + result.end_logits[0]
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 5 hrs to fix
                                  official/nlp/data/squad_lib.py on lines 628..637

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 89.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                      for (i, entry) in enumerate(nbest):
                                        output = collections.OrderedDict()
                                        output["text"] = entry.text
                                        output["probability"] = probs[i]
                                        output["start_logit"] = entry.start_logit
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 5 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 397..403
                                  official/nlp/data/squad_lib.py on lines 749..755

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 87.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      for i in range(len(para_tokens)):
                                        start_chartok_pos = tok_start_to_chartok_index[i]
                                        end_chartok_pos = tok_end_to_chartok_index[i]
                                        start_orig_pos = _convert_index(
                                            chartok_to_orig_index, start_chartok_pos, n, is_start=True)
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 4 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 626..635

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 84.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                    if translated_input_folder is not None:
                                      translated_files = tf.io.gfile.glob(
                                          os.path.join(translated_input_folder, "*.json"))
                                      for file in translated_files:
                                        with tf.io.gfile.GFile(file, "r") as reader:
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 4 hrs to fix
                                  official/nlp/data/squad_lib.py on lines 168..173

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 78.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                    def __init__(self,
                                                 qas_id,
                                                 question_text,
                                                 paragraph_text,
                                                 orig_answer_text=None,
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 3 hrs to fix
                                  official/nlp/data/squad_lib.py on lines 48..62

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 67.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      for _ in range(2):
                                        _lcs_match(max_dist)
                                        if f[n - 1, m - 1] > 0.8 * n:
                                          break
                                        max_dist *= 2
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 3 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 598..602

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 62.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                            if not is_impossible:
                                              answer = qa["answers"][0]
                                              orig_answer_text = answer["text"]
                                              start_position = answer["answer_start"]
                                            else:
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 2 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 456..462

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 58.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      if (all(v is None for v in orig_to_chartok_index) or
                                          f[n - 1, m - 1] < 0.8 * n):
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 2 hrs to fix
                                  official/legacy/xlnet/squad_utils.py on lines 619..620

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 57.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                        while len(input_ids) < max_seq_length:
                                          input_ids.append(0)
                                          input_mask.append(0)
                                          segment_ids.append(seg_pad)
                                          paragraph_mask.append(0)
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 2 hrs to fix
                                  official/nlp/data/squad_lib.py on lines 377..381

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 50.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                  def write_to_json_files(json_records, json_file):
                                    with tf.io.gfile.GFile(json_file, "w") as writer:
                                      writer.write(json.dumps(json_records, indent=4) + "\n")
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 1 hr to fix
                                  official/nlp/data/squad_lib.py on lines 788..790

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 49.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      for entry in nbest:
                                        total_scores.append(entry.start_logit + entry.end_logit)
                                        if not best_non_null_entry:
                                          if entry.text:
                                            best_non_null_entry = entry
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 1 hr to fix
                                  official/nlp/data/squad_lib.py on lines 740..744

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 44.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      if version_2_with_negative and not xlnet_format:
                                        if "" not in seen_predictions:
                                          nbest.append(
                                              _NbestPrediction(
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 1 hr to fix
                                  official/nlp/data/squad_lib.py on lines 724..727

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 44.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      if version_2_with_negative and not xlnet_format:
                                        prelim_predictions.append(
                                            _PrelimPrediction(
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 1 hr to fix
                                  official/nlp/data/squad_lib.py on lines 669..671

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 40.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                            if (len(qa["answers"]) != 1) and (not is_impossible):
                                              raise ValueError(
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 50 mins to fix
                                  official/legacy/xlnet/squad_utils.py on lines 453..454
                                  official/nlp/data/squad_lib.py on lines 209..210

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 36.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      for _ in range(num_padding):
                                        dummy_feature.unique_id = unique_id
                                  
                                        # Run callback
                                        output_fn(feature, is_padding=True)
                                  Severity: Minor
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 45 mins to fix
                                  official/nlp/data/squad_lib.py on lines 468..473

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 35.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                    with tf.io.gfile.GFile(input_file, "r") as reader:
                                      input_data = json.load(reader)["data"]
                                  Severity: Major
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 40 mins to fix
                                  official/legacy/xlnet/squad_utils.py on lines 436..437
                                  official/nlp/data/squad_lib.py on lines 165..166

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 34.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                          for token in query_tokens:
                                            tokens.append(token)
                                            segment_ids.append(seg_q)
                                            paragraph_mask.append(0)
                                  Severity: Minor
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 35 mins to fix
                                  official/nlp/data/squad_lib.py on lines 329..332

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 33.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Identical blocks of code found in 2 locations. Consider refactoring.
                                  Open

                                      if unique_id % batch_size != 0:
                                        num_padding = batch_size - (num_examples % batch_size)
                                  Severity: Minor
                                  Found in official/nlp/data/squad_lib_sp.py and 1 other location - About 35 mins to fix
                                  official/nlp/data/squad_lib.py on lines 463..464

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 33.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  Similar blocks of code found in 3 locations. Consider refactoring.
                                  Open

                                      prelim_predictions = sorted(
                                          prelim_predictions,
                                          key=lambda x: (x.start_logit + x.end_logit),
                                  Severity: Minor
                                  Found in official/nlp/data/squad_lib_sp.py and 2 other locations - About 30 mins to fix
                                  official/legacy/xlnet/squad_utils.py on lines 350..352
                                  official/nlp/data/squad_lib.py on lines 677..679

                                  Duplicated Code

                                  Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

                                  Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

                                  When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

                                  Tuning

                                  This issue has a mass of 32.

                                  We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

                                  The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

                                  If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

                                  See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

                                  Refactorings

                                  Further Reading

                                  There are no issues that match your filters.

                                  Category
                                  Status