HazyResearch/fonduer

View on GitHub

Showing 134 of 224 total issues

Function extract_textual_features has a Cognitive Complexity of 79 (exceeds 5 allowed). Consider refactoring.
Open

def extract_textual_features(
    candidates: Union[Candidate, List[Candidate]],
) -> Iterator[Tuple[int, str, int]]:
    """Extract textual features.

Severity: Minor
Found in src/fonduer/features/feature_libs/textual_features.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File parser.py has 675 lines of code (exceeds 250 allowed). Consider refactoring.
Open

"""Fonduer parser."""
import itertools
import logging
import re
import warnings
Severity: Major
Found in src/fonduer/parser/parser.py - About 1 day to fix

    Function _parse_file has a Cognitive Complexity of 72 (exceeds 5 allowed). Consider refactoring.
    Open

        def _parse_file(self, fp: str, file_name: str) -> Iterator[Document]:
            # Adapted from https://github.com/ocropus/hocr-tools/blob/v1.3.0/hocr-check
            def get_prop(node: Tag, name: str) -> Optional[str]:
                title = node["title"]
                if not title:
    Severity: Minor
    Found in src/fonduer/parser/preprocessors/hocr_doc_preprocessor.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function get_neighbor_cell_ngrams has a Cognitive Complexity of 52 (exceeds 5 allowed). Consider refactoring.
    Open

    def get_neighbor_cell_ngrams(
        mention: Union[Candidate, Mention, TemporarySpanMention],
        dist: int = 1,
        directions: bool = False,
        attrib: str = "words",
    Severity: Minor
    Found in src/fonduer/utils/data_model_utils/tabular.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function parse has a Cognitive Complexity of 50 (exceeds 5 allowed). Consider refactoring.
    Open

        def parse(
            self, document_name: str, sentences: Iterable[Sentence]
        ) -> Iterator[Sentence]:
            """Parse visual information embedded in sentence's html_attrs.
    
    
    Severity: Minor
    Found in src/fonduer/parser/visual_parser/hocr_visual_parser.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function apply has a Cognitive Complexity of 49 (exceeds 5 allowed). Consider refactoring.
    Open

        def apply(  # type: ignore
            self, doc: Document, split: int, **kwargs: Any
        ) -> Document:
            """Extract candidates from the given Context.
    
    
    Severity: Minor
    Found in src/fonduer/candidates/candidates.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File tabular.py has 475 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    """Fonduer tabular modality utilities."""
    from builtins import range
    from collections import defaultdict
    from functools import lru_cache
    from itertools import chain
    Severity: Minor
    Found in src/fonduer/utils/data_model_utils/tabular.py - About 7 hrs to fix

      Function _link_lists has a Cognitive Complexity of 47 (exceeds 5 allowed). Consider refactoring.
      Open

          def _link_lists(
              self, search_max: int = 100, edit_cost: int = 20, offset_cost: int = 1
          ) -> None:
              # NOTE: there are probably some inefficiencies here from rehashing words
              # multiple times, but we're not going to worry about that for now
      Severity: Minor
      Found in src/fonduer/parser/visual_parser/pdf_visual_parser.py - About 7 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      File mentions.py has 463 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      """Fonduer mention."""
      import logging
      import re
      from builtins import map, range
      from typing import Any, Collection, Dict, Iterable, Iterator, List, Optional, Set, Union
      Severity: Minor
      Found in src/fonduer/candidates/mentions.py - About 7 hrs to fix

        Function _get_window_features has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
        Open

        def _get_window_features(
            context: Dict[str, Any],
            idxs: List[int],
            window: int = settings["featurization"]["textual"]["window_feature"]["size"],
            combinations: bool = settings["featurization"]["textual"]["window_feature"][
        Severity: Minor
        Found in src/fonduer/features/feature_libs/textual_features.py - About 7 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_visual_features has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_visual_features(
            candidates: Union[Candidate, List[Candidate]],
        ) -> Iterator[Tuple[int, str, int]]:
            """Extract visual features.
        
        
        Severity: Minor
        Found in src/fonduer/features/feature_libs/visual_features.py - About 7 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_structural_features has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_structural_features(
            candidates: Union[Candidate, List[Candidate]],
        ) -> Iterator[Tuple[int, str, int]]:
            """Extract structural features.
        
        
        Severity: Minor
        Found in src/fonduer/features/feature_libs/structural_features.py - About 7 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_tabular_features has a Cognitive Complexity of 41 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_tabular_features(
            candidates: Union[Candidate, List[Candidate]],
        ) -> Iterator[Tuple[int, str, int]]:
            """Extract tabular features.
        
        
        Severity: Minor
        Found in src/fonduer/features/feature_libs/tabular_features.py - About 6 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        File labeler.py has 406 lines of code (exceeds 250 allowed). Consider refactoring.
        Open

        """Fonduer labeler."""
        import logging
        from collections import defaultdict
        from typing import (
            Any,
        Severity: Minor
        Found in src/fonduer/supervision/labeler.py - About 5 hrs to fix

          File matchers.py has 401 lines of code (exceeds 250 allowed). Consider refactoring.
          Open

          """Fonduer matcher."""
          import re
          from typing import Iterator, Set
          
          from fonduer.candidates.models.figure_mention import TemporaryFigureMention
          Severity: Minor
          Found in src/fonduer/candidates/matchers.py - About 5 hrs to fix

            Function apply has a Cognitive Complexity of 36 (exceeds 5 allowed). Consider refactoring.
            Open

                def apply(self, context: Sentence) -> Iterator[TemporarySpanMention]:
                    """Apply function takes a Sentence and return a mention generator.
            
                    :param x: The input Sentence.
                    :yield: The mention generator.
            Severity: Minor
            Found in src/fonduer/candidates/mentions.py - About 5 hrs to fix

            Cognitive Complexity

            Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

            A method's cognitive complexity is based on a few simple rules:

            • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
            • Code is considered more complex for each "break in the linear flow of the code"
            • Code is considered more complex when "flow breaking structures are nested"

            Further reading

            Function _get_direction_ngrams has a Cognitive Complexity of 34 (exceeds 5 allowed). Consider refactoring.
            Open

            def _get_direction_ngrams(
                direction: str,
                c: Union[Candidate, Mention, TemporarySpanMention],
                attrib: str,
                n_min: int,
            Severity: Minor
            Found in src/fonduer/utils/data_model_utils/visual.py - About 5 hrs to fix

            Cognitive Complexity

            Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

            A method's cognitive complexity is based on a few simple rules:

            • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
            • Code is considered more complex for each "break in the linear flow of the code"
            • Code is considered more complex when "flow breaking structures are nested"

            Further reading

            File visual.py has 376 lines of code (exceeds 250 allowed). Consider refactoring.
            Open

            """Fonduer visual modality utilities."""
            from builtins import range
            from collections import defaultdict
            from functools import lru_cache
            from typing import Any, DefaultDict, Iterator, List, Set, Union
            Severity: Minor
            Found in src/fonduer/utils/data_model_utils/visual.py - About 5 hrs to fix

              File fonduer_model.py has 358 lines of code (exceeds 250 allowed). Consider refactoring.
              Open

              """Customized MLflow model for Fonduer."""
              import logging
              import os
              import sys
              from io import BytesIO
              Severity: Minor
              Found in src/fonduer/packaging/fonduer_model.py - About 4 hrs to fix

                File utils_udf.py has 320 lines of code (exceeds 250 allowed). Consider refactoring.
                Open

                """Fonduer UDF utils."""
                import logging
                from typing import (
                    Any,
                    Callable,
                Severity: Minor
                Found in src/fonduer/utils/utils_udf.py - About 3 hrs to fix
                  Severity
                  Category
                  Status
                  Source
                  Language