diasks2/pragmatic_segmenter

View on GitHub

Showing 24 of 24 total issues

Class List has 21 methods (exceeds 20 allowed). Consider refactoring.
Open

  class List
    ROMAN_NUMERALS = %w(i ii iii iv v vi vii viii ix x xi xii xiii xiv x xi xii xiii xv xvi xvii xviii xix xx)
    LATIN_NUMERALS = ('a'..'z').to_a

    # Rubular: http://rubular.com/r/XcpaJKH0sz
Severity: Minor
Found in lib/pragmatic_segmenter/list.rb - About 2 hrs to fix

    Method scan_lists has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
    Open

        def scan_lists(regex1, regex2, replacement, strip: false)
          list_array = @text.scan(regex1).map(&:to_i)
          list_array.each_with_index do |a, i|
            next unless (a + 1).eql?(list_array[i + 1]) ||
                        (a - 1).eql?(list_array[i - 1]) ||
    Severity: Minor
    Found in lib/pragmatic_segmenter/list.rb - About 45 mins to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Method replace_alphabet_list_parens has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
    Open

        def replace_alphabet_list_parens(a)
          @text.gsub!(EXTRACT_ALPHABETICAL_LIST_LETTERS_REGEX).with_index do |m|
            if m.include?('(')
              a.eql?(Unicode::downcase(m.dup).gsub!(/\(/, '')) ? "\r&✂&#{Regexp.escape(m.gsub!(/\(/, ''))}" : "#{m}"
            else
    Severity: Minor
    Found in lib/pragmatic_segmenter/list.rb - About 45 mins to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Similar blocks of code found in 2 locations. Consider refactoring.
    Open

          class BetweenPunctuation < PragmaticSegmenter::BetweenPunctuation
            BETWEEN_DOUBLE_ANGLE_QUOTATION_MARK_REGEX = /《(?>[^》\\]+|\\{2}|\\.)*》/
            BETWEEN_L_BRACKET_REGEX = /「(?>[^」\\]+|\\{2}|\\.)*」/
            private
    
    
    Severity: Minor
    Found in lib/pragmatic_segmenter/languages/chinese.rb and 1 other location - About 45 mins to fix
    lib/pragmatic_segmenter/languages/japanese.rb on lines 28..53

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 39.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    Similar blocks of code found in 2 locations. Consider refactoring.
    Open

          class BetweenPunctuation < PragmaticSegmenter::BetweenPunctuation
            # Rubular: http://rubular.com/r/GnjOmry5Z2
            BETWEEN_QUOTE_JA_REGEX = /\u{300c}(?>[^\u{300c}\u{300d}\\]+|\\{2}|\\.)*\u{300d}/
    
            # Rubular: http://rubular.com/r/EjHcZn5ZSG
    Severity: Minor
    Found in lib/pragmatic_segmenter/languages/japanese.rb and 1 other location - About 45 mins to fix
    lib/pragmatic_segmenter/languages/chinese.rb on lines 12..34

    Duplicated Code

    Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

    Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

    When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

    Tuning

    This issue has a mass of 39.

    We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

    The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

    If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

    See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

    Refactorings

    Further Reading

    Consider simplifying this complex logical expression.
    Open

            next unless (a + 1).eql?(list_array[i + 1]) ||
                        (a - 1).eql?(list_array[i - 1]) ||
                        (a.eql?(0) && list_array[i - 1].eql?(9)) ||
                        (a.eql?(9) && list_array[i + 1].eql?(0))
    Severity: Major
    Found in lib/pragmatic_segmenter/list.rb - About 40 mins to fix

      Method other_items_replacement has 5 arguments (exceeds 4 allowed). Consider refactoring.
      Open

          def other_items_replacement(a, i, alphabet, list_array, parens)
      Severity: Minor
      Found in lib/pragmatic_segmenter/list.rb - About 35 mins to fix

        Method last_array_item_replacement has 5 arguments (exceeds 4 allowed). Consider refactoring.
        Open

            def last_array_item_replacement(a, i, alphabet, list_array, parens)
        Severity: Minor
        Found in lib/pragmatic_segmenter/list.rb - About 35 mins to fix

          Method search_for_abbreviations_in_string has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
          Open

              def search_for_abbreviations_in_string(txt)
                original = txt.dup
                downcased = Unicode::downcase(txt)
                @language::Abbreviation::ABBREVIATIONS.each do |abbreviation|
                  stripped = abbreviation.strip
          Severity: Minor
          Found in lib/pragmatic_segmenter/abbreviation_replacer.rb - About 35 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Method substitute_found_list_items has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
          Open

              def substitute_found_list_items(regex, a, strip, replacement)
                @text.gsub!(regex).with_index do |m|
                  if a.to_s.eql?(strip ? m.strip.chop : m)
                    "#{Regexp.escape(a.to_s)}" + replacement
                  else
          Severity: Minor
          Found in lib/pragmatic_segmenter/list.rb - About 35 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Similar blocks of code found in 3 locations. Consider refactoring.
          Open

                module ReinsertEllipsisRules
                  SubThreeConsecutivePeriod = Rule.new(/ƪ/, '...')
                  SubThreeSpacePeriod = Rule.new(/♟/, ' . . . ')
                  SubFourSpacePeriod = Rule.new(/♝/, '. . . .')
                  SubTwoConsecutivePeriod = Rule.new(/☏/, '..')
          Severity: Minor
          Found in lib/pragmatic_segmenter/languages/common.rb and 2 other locations - About 30 mins to fix
          lib/pragmatic_segmenter/languages/common/ellipsis.rb on lines 13..34
          lib/pragmatic_segmenter/languages/common/numbers.rb on lines 7..29

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 33.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Similar blocks of code found in 3 locations. Consider refactoring.
          Open

                module Numbers
                  # Rubular: http://rubular.com/r/oNyxBOqbyy
                  PeriodBeforeNumberRule = Rule.new(/\.(?=\d)/, '∯')
          
                  # Rubular: http://rubular.com/r/EMk5MpiUzt
          Severity: Minor
          Found in lib/pragmatic_segmenter/languages/common/numbers.rb and 2 other locations - About 30 mins to fix
          lib/pragmatic_segmenter/languages/common.rb on lines 89..98
          lib/pragmatic_segmenter/languages/common/ellipsis.rb on lines 13..34

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 33.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Similar blocks of code found in 2 locations. Consider refactoring.
          Open

                module SubEscapedRegexReservedCharacters
                  SubLeftParen = Rule.new('\\(', '(')
                  SubRightParen = Rule.new('\\)', ')')
                  SubLeftBracket = Rule.new('\\[', '[')
                  SubRightBracket = Rule.new('\\]', ']')
          Severity: Minor
          Found in lib/pragmatic_segmenter/punctuation_replacer.rb and 1 other location - About 30 mins to fix
          lib/pragmatic_segmenter/punctuation_replacer.rb on lines 9..17

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 33.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Similar blocks of code found in 3 locations. Consider refactoring.
          Open

                module EllipsisRules
                  # Rubular: http://rubular.com/r/i60hCK81fz
                  ThreeConsecutiveRule = Rule.new(/\.\.\.(?=\s+[A-Z])/, '☏.')
          
                  # Rubular: http://rubular.com/r/Hdqpd90owl
          Severity: Minor
          Found in lib/pragmatic_segmenter/languages/common/ellipsis.rb and 2 other locations - About 30 mins to fix
          lib/pragmatic_segmenter/languages/common.rb on lines 89..98
          lib/pragmatic_segmenter/languages/common/numbers.rb on lines 7..29

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 33.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Similar blocks of code found in 2 locations. Consider refactoring.
          Open

                module EscapeRegexReservedCharacters
                  LeftParen = Rule.new('(', '\\(')
                  RightParen = Rule.new(')', '\\)')
                  LeftBracket = Rule.new('[', '\\[')
                  RightBracket = Rule.new(']', '\\]')
          Severity: Minor
          Found in lib/pragmatic_segmenter/punctuation_replacer.rb and 1 other location - About 30 mins to fix
          lib/pragmatic_segmenter/punctuation_replacer.rb on lines 20..28

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 33.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Method post_process_segments has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.
          Open

              def post_process_segments(txt)
                return txt if txt.length < 2 && txt =~ /\A[a-zA-Z]*\Z/
                return if consecutive_underscore?(txt) || txt.length < 2
                Rule.apply(
                  txt,
          Severity: Minor
          Found in lib/pragmatic_segmenter/processor.rb - About 25 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Method scan_for_replacements has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.
          Open

              def scan_for_replacements(txt, am, index, character_array)
                character = character_array[index]
                prepositive = @language::Abbreviation::PREPOSITIVE_ABBREVIATIONS
                number_abbr = @language::Abbreviation::NUMBER_ABBREVIATIONS
                upper = /[[:upper:]]/.match(character.to_s)
          Severity: Minor
          Found in lib/pragmatic_segmenter/abbreviation_replacer.rb - About 25 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Identical blocks of code found in 2 locations. Consider refactoring.
          Open

                class AbbreviationReplacer < AbbreviationReplacer
                  SENTENCE_STARTERS = %w(
                    A Being Did For He How However I In It Millions More She That The
                    There They We What When Where Who Why
          Severity: Minor
          Found in lib/pragmatic_segmenter/languages/common.rb and 1 other location - About 20 mins to fix
          lib/pragmatic_segmenter/languages/english.rb on lines 25..28

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 28.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Identical blocks of code found in 2 locations. Consider refactoring.
          Open

                class AbbreviationReplacer < AbbreviationReplacer
                  SENTENCE_STARTERS = %w(
                    A Being Did For He How However I In It Millions More She That The
                    There They We What When Where Who Why
          Severity: Minor
          Found in lib/pragmatic_segmenter/languages/english.rb and 1 other location - About 20 mins to fix
          lib/pragmatic_segmenter/languages/common.rb on lines 105..108

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 28.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Similar blocks of code found in 3 locations. Consider refactoring.
          Open

              def replace_period_of_abbr(txt, abbr)
                txt.gsub!(/(?<=\s#{abbr.strip})\.(?=((\.|\:|-|\?)|(\s([a-z]|I\s|I'm|I'll|\d|\())))|(?<=^#{abbr.strip})\.(?=((\.|\:|\?)|(\s([a-z]|I\s|I'm|I'll|\d))))/, '∯')
                txt.gsub!(/(?<=\s#{abbr.strip})\.(?=,)|(?<=^#{abbr.strip})\.(?=,)/, '∯')
                txt
          Severity: Minor
          Found in lib/pragmatic_segmenter/abbreviation_replacer.rb and 2 other locations - About 20 mins to fix
          lib/pragmatic_segmenter/abbreviation_replacer.rb on lines 99..102
          lib/pragmatic_segmenter/abbreviation_replacer.rb on lines 105..108

          Duplicated Code

          Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

          Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

          When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

          Tuning

          This issue has a mass of 27.

          We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

          The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

          If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

          See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

          Refactorings

          Further Reading

          Severity
          Category
          Status
          Source
          Language