HazyResearch/fonduer

View on GitHub

Showing 224 of 224 total issues

Function extract_textual_features has a Cognitive Complexity of 79 (exceeds 5 allowed). Consider refactoring.
Open

def extract_textual_features(
    candidates: Union[Candidate, List[Candidate]],
) -> Iterator[Tuple[int, str, int]]:
    """Extract textual features.

Severity: Minor
Found in src/fonduer/features/feature_libs/textual_features.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File parser.py has 675 lines of code (exceeds 250 allowed). Consider refactoring.
Open

"""Fonduer parser."""
import itertools
import logging
import re
import warnings
Severity: Major
Found in src/fonduer/parser/parser.py - About 1 day to fix

    Function _parse_file has a Cognitive Complexity of 72 (exceeds 5 allowed). Consider refactoring.
    Open

        def _parse_file(self, fp: str, file_name: str) -> Iterator[Document]:
            # Adapted from https://github.com/ocropus/hocr-tools/blob/v1.3.0/hocr-check
            def get_prop(node: Tag, name: str) -> Optional[str]:
                title = node["title"]
                if not title:
    Severity: Minor
    Found in src/fonduer/parser/preprocessors/hocr_doc_preprocessor.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function get_neighbor_cell_ngrams has a Cognitive Complexity of 52 (exceeds 5 allowed). Consider refactoring.
    Open

    def get_neighbor_cell_ngrams(
        mention: Union[Candidate, Mention, TemporarySpanMention],
        dist: int = 1,
        directions: bool = False,
        attrib: str = "words",
    Severity: Minor
    Found in src/fonduer/utils/data_model_utils/tabular.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function parse has a Cognitive Complexity of 50 (exceeds 5 allowed). Consider refactoring.
    Open

        def parse(
            self, document_name: str, sentences: Iterable[Sentence]
        ) -> Iterator[Sentence]:
            """Parse visual information embedded in sentence's html_attrs.
    
    
    Severity: Minor
    Found in src/fonduer/parser/visual_parser/hocr_visual_parser.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function apply has a Cognitive Complexity of 49 (exceeds 5 allowed). Consider refactoring.
    Open

        def apply(  # type: ignore
            self, doc: Document, split: int, **kwargs: Any
        ) -> Document:
            """Extract candidates from the given Context.
    
    
    Severity: Minor
    Found in src/fonduer/candidates/candidates.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File tabular.py has 475 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    """Fonduer tabular modality utilities."""
    from builtins import range
    from collections import defaultdict
    from functools import lru_cache
    from itertools import chain
    Severity: Minor
    Found in src/fonduer/utils/data_model_utils/tabular.py - About 7 hrs to fix

      Function _link_lists has a Cognitive Complexity of 47 (exceeds 5 allowed). Consider refactoring.
      Open

          def _link_lists(
              self, search_max: int = 100, edit_cost: int = 20, offset_cost: int = 1
          ) -> None:
              # NOTE: there are probably some inefficiencies here from rehashing words
              # multiple times, but we're not going to worry about that for now
      Severity: Minor
      Found in src/fonduer/parser/visual_parser/pdf_visual_parser.py - About 7 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      File mentions.py has 463 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      """Fonduer mention."""
      import logging
      import re
      from builtins import map, range
      from typing import Any, Collection, Dict, Iterable, Iterator, List, Optional, Set, Union
      Severity: Minor
      Found in src/fonduer/candidates/mentions.py - About 7 hrs to fix

        Function extract_structural_features has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_structural_features(
            candidates: Union[Candidate, List[Candidate]],
        ) -> Iterator[Tuple[int, str, int]]:
            """Extract structural features.
        
        
        Severity: Minor
        Found in src/fonduer/features/feature_libs/structural_features.py - About 7 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function _get_window_features has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
        Open

        def _get_window_features(
            context: Dict[str, Any],
            idxs: List[int],
            window: int = settings["featurization"]["textual"]["window_feature"]["size"],
            combinations: bool = settings["featurization"]["textual"]["window_feature"][
        Severity: Minor
        Found in src/fonduer/features/feature_libs/textual_features.py - About 7 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_visual_features has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_visual_features(
            candidates: Union[Candidate, List[Candidate]],
        ) -> Iterator[Tuple[int, str, int]]:
            """Extract visual features.
        
        
        Severity: Minor
        Found in src/fonduer/features/feature_libs/visual_features.py - About 7 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_tabular_features has a Cognitive Complexity of 41 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_tabular_features(
            candidates: Union[Candidate, List[Candidate]],
        ) -> Iterator[Tuple[int, str, int]]:
            """Extract tabular features.
        
        
        Severity: Minor
        Found in src/fonduer/features/feature_libs/tabular_features.py - About 6 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        File labeler.py has 406 lines of code (exceeds 250 allowed). Consider refactoring.
        Open

        """Fonduer labeler."""
        import logging
        from collections import defaultdict
        from typing import (
            Any,
        Severity: Minor
        Found in src/fonduer/supervision/labeler.py - About 5 hrs to fix

          File matchers.py has 401 lines of code (exceeds 250 allowed). Consider refactoring.
          Open

          """Fonduer matcher."""
          import re
          from typing import Iterator, Set
          
          from fonduer.candidates.models.figure_mention import TemporaryFigureMention
          Severity: Minor
          Found in src/fonduer/candidates/matchers.py - About 5 hrs to fix

            Function apply has a Cognitive Complexity of 36 (exceeds 5 allowed). Consider refactoring.
            Open

                def apply(self, context: Sentence) -> Iterator[TemporarySpanMention]:
                    """Apply function takes a Sentence and return a mention generator.
            
                    :param x: The input Sentence.
                    :yield: The mention generator.
            Severity: Minor
            Found in src/fonduer/candidates/mentions.py - About 5 hrs to fix

            Cognitive Complexity

            Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

            A method's cognitive complexity is based on a few simple rules:

            • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
            • Code is considered more complex for each "break in the linear flow of the code"
            • Code is considered more complex when "flow breaking structures are nested"

            Further reading

            Cyclomatic complexity is too high in function extract_textual_features. (26)
            Open

            def extract_textual_features(
                candidates: Union[Candidate, List[Candidate]],
            ) -> Iterator[Tuple[int, str, int]]:
                """Extract textual features.
            
            

            Cyclomatic Complexity

            Cyclomatic Complexity corresponds to the number of decisions a block of code contains plus 1. This number (also called McCabe number) is equal to the number of linearly independent paths through the code. This number can be used as a guide when testing conditional logic in blocks.

            Radon analyzes the AST tree of a Python program to compute Cyclomatic Complexity. Statements have the following effects on Cyclomatic Complexity:

            Construct Effect on CC Reasoning
            if +1 An if statement is a single decision.
            elif +1 The elif statement adds another decision.
            else +0 The else statement does not cause a new decision. The decision is at the if.
            for +1 There is a decision at the start of the loop.
            while +1 There is a decision at the while statement.
            except +1 Each except branch adds a new conditional path of execution.
            finally +0 The finally block is unconditionally executed.
            with +1 The with statement roughly corresponds to a try/except block (see PEP 343 for details).
            assert +1 The assert statement internally roughly equals a conditional statement.
            Comprehension +1 A list/set/dict comprehension of generator expression is equivalent to a for loop.
            Boolean Operator +1 Every boolean operator (and, or) adds a decision point.

            Source: http://radon.readthedocs.org/en/latest/intro.html

            Function _get_direction_ngrams has a Cognitive Complexity of 34 (exceeds 5 allowed). Consider refactoring.
            Open

            def _get_direction_ngrams(
                direction: str,
                c: Union[Candidate, Mention, TemporarySpanMention],
                attrib: str,
                n_min: int,
            Severity: Minor
            Found in src/fonduer/utils/data_model_utils/visual.py - About 5 hrs to fix

            Cognitive Complexity

            Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

            A method's cognitive complexity is based on a few simple rules:

            • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
            • Code is considered more complex for each "break in the linear flow of the code"
            • Code is considered more complex when "flow breaking structures are nested"

            Further reading

            File visual.py has 376 lines of code (exceeds 250 allowed). Consider refactoring.
            Open

            """Fonduer visual modality utilities."""
            from builtins import range
            from collections import defaultdict
            from functools import lru_cache
            from typing import Any, DefaultDict, Iterator, List, Set, Union
            Severity: Minor
            Found in src/fonduer/utils/data_model_utils/visual.py - About 5 hrs to fix

              Cyclomatic complexity is too high in method apply. (25)
              Open

                  def apply(  # type: ignore
                      self, doc: Document, split: int, **kwargs: Any
                  ) -> Document:
                      """Extract candidates from the given Context.
              
              
              Severity: Minor
              Found in src/fonduer/candidates/candidates.py by radon

              Cyclomatic Complexity

              Cyclomatic Complexity corresponds to the number of decisions a block of code contains plus 1. This number (also called McCabe number) is equal to the number of linearly independent paths through the code. This number can be used as a guide when testing conditional logic in blocks.

              Radon analyzes the AST tree of a Python program to compute Cyclomatic Complexity. Statements have the following effects on Cyclomatic Complexity:

              Construct Effect on CC Reasoning
              if +1 An if statement is a single decision.
              elif +1 The elif statement adds another decision.
              else +0 The else statement does not cause a new decision. The decision is at the if.
              for +1 There is a decision at the start of the loop.
              while +1 There is a decision at the while statement.
              except +1 Each except branch adds a new conditional path of execution.
              finally +0 The finally block is unconditionally executed.
              with +1 The with statement roughly corresponds to a try/except block (see PEP 343 for details).
              assert +1 The assert statement internally roughly equals a conditional statement.
              Comprehension +1 A list/set/dict comprehension of generator expression is equivalent to a for loop.
              Boolean Operator +1 Every boolean operator (and, or) adds a decision point.

              Source: http://radon.readthedocs.org/en/latest/intro.html

              Severity
              Category
              Status
              Source
              Language