nlpodyssey/gotokenizers

View on GitHub

Showing 18 of 92 total issues

File normalizedstring.go has 568 lines of code (exceeds 500 allowed). Consider refactoring.
Open

// Copyright (c) 2020, NLP Odyssey Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package normalizedstring
Severity: Minor
Found in normalizedstring/normalizedstring.go - About 3 hrs to fix

    NormalizedString has 27 methods (exceeds 20 allowed). Consider refactoring.
    Open

    type NormalizedString struct {
        // The original version of the string, before any modification.
        original string
        // The normalized version of the string, after all modifications.
        normalized string
    Severity: Minor
    Found in normalizedstring/normalizedstring.go - About 3 hrs to fix

      Method NormalizedString.Split has a Cognitive Complexity of 34 (exceeds 20 allowed). Consider refactoring.
      Open

      func (ns *NormalizedString) Split(
          pattern splitpattern.SplitPattern,
          behaviour SplitDelimiterBehavior,
      ) ([]*NormalizedString, error) {
          captures, err := pattern.FindMatches(ns.normalized)
      Severity: Minor
      Found in normalizedstring/normalizedstring.go - About 2 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Method NormalizedString.Split has 83 lines of code (exceeds 50 allowed). Consider refactoring.
      Open

      func (ns *NormalizedString) Split(
          pattern splitpattern.SplitPattern,
          behaviour SplitDelimiterBehavior,
      ) ([]*NormalizedString, error) {
          captures, err := pattern.FindMatches(ns.normalized)
      Severity: Major
      Found in normalizedstring/normalizedstring.go - About 2 hrs to fix

        Method Word.MergeAll has a Cognitive Complexity of 32 (exceeds 20 allowed). Consider refactoring.
        Open

        func (w *Word) MergeAll(merges *MergeMap, dropout float64) {
            symbolsLen := w.Len()
            queue := make(WordMergeHeap, 0, symbolsLen)
            skip := make([]WordMerge, 0, symbolsLen)
        
        
        Severity: Minor
        Found in models/bpemodel/word.go - About 2 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Method WordPieceModel.Tokenize has 57 lines of code (exceeds 50 allowed). Consider refactoring.
        Open

        func (m *WordPieceModel) Tokenize(sequence string) ([]models.Token, error) {
            if len([]rune(sequence)) > m.maxInputCharsPerWord {
                unkTokenID, unkTokenExists := m.vocab.GetID(m.unknownToken)
                if !unkTokenExists {
                    return nil, ErrUnknownTokenOutOfVocabulary
        Severity: Minor
        Found in models/wordpiecemodel/wordpiecemodel.go - About 1 hr to fix

          Method BPEModel.mergeWord has a Cognitive Complexity of 26 (exceeds 20 allowed). Consider refactoring.
          Open

          func (m *BPEModel) mergeWord(w string) (*Word, error) {
              word := NewWordWithCapacity(len(w))
          
              var unkTokenID int
          
          
          Severity: Minor
          Found in models/bpemodel/bpemodel.go - About 1 hr to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Method Word.MergeAll has 54 lines of code (exceeds 50 allowed). Consider refactoring.
          Open

          func (w *Word) MergeAll(merges *MergeMap, dropout float64) {
              symbolsLen := w.Len()
              queue := make(WordMergeHeap, 0, symbolsLen)
              skip := make([]WordMerge, 0, symbolsLen)
          
          
          Severity: Minor
          Found in models/bpemodel/word.go - About 1 hr to fix

            Method NormalizedString.OriginalAlignments has 54 lines of code (exceeds 50 allowed). Consider refactoring.
            Open

            func (ns *NormalizedString) OriginalAlignments() []AlignmentRange {
                // (start, end) are in alignments
                // (offset, length) are in originalAlignments
                originalAlignments := make([]AlignmentRange, 0, len(ns.original))
            
            
            Severity: Minor
            Found in normalizedstring/normalizedstring.go - About 1 hr to fix

              Method NormalizedString.TransformRange has 51 lines of code (exceeds 50 allowed). Consider refactoring.
              Open

              func (ns *NormalizedString) TransformRange(
                  rng Range,
                  dest []RuneChange,
                  initialOffset int,
              ) {
              Severity: Minor
              Found in normalizedstring/normalizedstring.go - About 1 hr to fix

                Function New has 8 arguments (exceeds 4 allowed). Consider refactoring.
                Open

                    vocab *vocabulary.Vocabulary,
                    merges *MergeMap,
                    cacheCapacity int,
                    dropout float64,
                    unknownToken string,
                Severity: Major
                Found in models/bpemodel/bpemodel.go - About 1 hr to fix

                  Function NewEncoding has 8 arguments (exceeds 4 allowed). Consider refactoring.
                  Open

                      ids []int,
                      typeIDs []int,
                      tokens []string,
                      words []int,
                      offsets []strutils.ByteOffsets,
                  Severity: Major
                  Found in encodings/encodings.go - About 1 hr to fix

                    Function MergeMapFromFile has 7 return statements (exceeds 4 allowed).
                    Open

                    func MergeMapFromFile(
                        filename string,
                        vocab *vocabulary.Vocabulary,
                        prefixLength int,
                    ) (m *MergeMap, err error) {
                    Severity: Major
                    Found in models/bpemodel/mergemap.go - About 45 mins to fix

                      Consider simplifying this complex logical expression.
                      Open

                              if (i >= '!' && i <= '~') || (i >= 0xA1 && i <= 0xAC) || (i >= 0xAE && i <= 0xFF) {
                      Severity: Major
                      Found in pretokenizers/bytelevelpretokenizer/bytelevelpretokenizer.go - About 40 mins to fix

                        Method BertPreTokenizer.PreTokenize has 6 return statements (exceeds 4 allowed).
                        Open

                        func (b *BertPreTokenizer) PreTokenize(pts *pretokenizedstring.PreTokenizedString) error {
                            isWhitespacePattern := splitpattern.FromFunc(func(r rune) bool {
                                return unicode.In(r, unicode.White_Space)
                            })
                            isBertPunctuationPattern := splitpattern.FromFunc(func(r rune) bool {
                        Severity: Major
                        Found in pretokenizers/bertpretokenizer/bertpretokenizer.go - About 40 mins to fix

                          Method NormalizedString.CoerceRangeToOriginal has 5 return statements (exceeds 4 allowed).
                          Open

                          func (ns *NormalizedString) CoerceRangeToOriginal(r Range) (OriginalRange, bool) {
                              // If the string range is already in the original referential, return it as it is
                              if or, isOriginal := r.(OriginalRange); isOriginal {
                                  return or, true
                              }
                          Severity: Major
                          Found in normalizedstring/normalizedstring.go - About 35 mins to fix

                            Method WordPieceModel.Tokenize has 5 return statements (exceeds 4 allowed).
                            Open

                            func (m *WordPieceModel) Tokenize(sequence string) ([]models.Token, error) {
                                if len([]rune(sequence)) > m.maxInputCharsPerWord {
                                    unkTokenID, unkTokenExists := m.vocab.GetID(m.unknownToken)
                                    if !unkTokenExists {
                                        return nil, ErrUnknownTokenOutOfVocabulary
                            Severity: Major
                            Found in models/wordpiecemodel/wordpiecemodel.go - About 35 mins to fix

                              Method WordPieceModel.Tokenize has a Cognitive Complexity of 21 (exceeds 20 allowed). Consider refactoring.
                              Open

                              func (m *WordPieceModel) Tokenize(sequence string) ([]models.Token, error) {
                                  if len([]rune(sequence)) > m.maxInputCharsPerWord {
                                      unkTokenID, unkTokenExists := m.vocab.GetID(m.unknownToken)
                                      if !unkTokenExists {
                                          return nil, ErrUnknownTokenOutOfVocabulary
                              Severity: Minor
                              Found in models/wordpiecemodel/wordpiecemodel.go - About 25 mins to fix

                              Cognitive Complexity

                              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                              A method's cognitive complexity is based on a few simple rules:

                              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                              • Code is considered more complex for each "break in the linear flow of the code"
                              • Code is considered more complex when "flow breaking structures are nested"

                              Further reading

                              Severity
                              Category
                              Status
                              Source
                              Language