UnB-KnEDLe/DODFMiner

View on GitHub

Showing 67 of 67 total issues

Function segment has a Cognitive Complexity of 51 (exceeds 5 allowed). Consider refactoring.
Open

  def segment(self, file):
    atos_contrato_convenio = {
      'numero_dodf':[],
      'titulo':[],
      'texto':[]
Severity: Minor
Found in dodfminer/extract/polished/acts/contrato_convenio.py - About 7 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File title_extractor.py has 487 lines of code (exceeds 250 allowed). Consider refactoring.
Open

"""
    Extract Title and Subtitles.
"""

# TODO: Improve docummentation
Severity: Minor
Found in dodfminer/extract/pure/utils/title_extractor.py - About 7 hrs to fix

    Function _get_special_acts has a Cognitive Complexity of 38 (exceeds 5 allowed). Consider refactoring.
    Open

        def _get_special_acts(self, lis_dict):
            for i, match in enumerate(self._raw_matches):
                act = match.group()
                curr_dict = lis_dict[i]
                data_dodf = curr_dict['data_dodf']
    Severity: Minor
    Found in dodfminer/extract/polished/acts/sem_efeito_aposentadoria.py - About 5 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function extract_structure has a Cognitive Complexity of 33 (exceeds 5 allowed). Consider refactoring.
    Open

        def extract_structure(cls, file, single=False, norm='NFKD'): # pylint: disable=too-many-locals
            """Extract boxes of text with their respective titles.
    
            Args:
                file: The DODF file to extract titles from.
    Severity: Minor
    Found in dodfminer/extract/pure/core.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File core.py has 347 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    # coding=utf-8
    
    """Extract content from DODFS and export to JSON.
    
    Contains class ContentExtractor which have to public functions
    Severity: Minor
    Found in dodfminer/extract/pure/core.py - About 4 hrs to fix

      Function post_process has a Cognitive Complexity of 27 (exceeds 5 allowed). Consider refactoring.
      Open

        def post_process(self):
          for IOB, text, numdodf, titulo in zip(self.predicted, self.atos_encontrados['texto'], self.atos_encontrados['numero_dodf'], self.atos_encontrados['titulo']):
            ent_dict = {
              'numero_dodf': '',
              'titulo': '',
      Severity: Minor
      Found in dodfminer/extract/polished/acts/base_contratos.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function extract_text has a Cognitive Complexity of 25 (exceeds 5 allowed). Consider refactoring.
      Open

          def extract_text(cls, file, single=False, block=False, is_json=True, sep=' ', norm='NFKD'):
              """Extract block of text from file
      
              Args:
                  file: The DODF to extract titles from.
      Severity: Minor
      Found in dodfminer/extract/pure/core.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function segment has a Cognitive Complexity of 23 (exceeds 5 allowed). Consider refactoring.
      Open

        def segment(self, file):
          atos_aditamento = {
            'numero_dodf':[],
            'titulo':[],
            'texto':[]
      Severity: Minor
      Found in dodfminer/extract/polished/acts/aditamento.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function _get_titles_subtitles has a Cognitive Complexity of 21 (exceeds 5 allowed). Consider refactoring.
      Open

      def _get_titles_subtitles(elements, width_lis):
          """Extracts titles and subtitles from list. WARNING: Based on font size and heuristic.
      
          Args:
              titles_subtitles: a list of dict all of them having the keys:
      Severity: Minor
      Found in dodfminer/extract/pure/utils/title_extractor.py - About 2 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function highlight_dataframe has a Cognitive Complexity of 21 (exceeds 5 allowed). Consider refactoring.
      Open

        def highlight_dataframe(self):
          if len(self.atos_encontrados) == 0:
            return
          self.data_frame = []
          for IOB, text, _, titulo in zip(self.predicted, self.atos_encontrados['texto'], self.atos_encontrados['numero_dodf'], self.atos_encontrados['titulo']):
      Severity: Minor
      Found in dodfminer/extract/polished/acts/base_contratos.py - About 2 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      File helper.py has 286 lines of code (exceeds 250 allowed). Consider refactoring.
      Open

      """ Polished extraction helper functions.
      
      Functions in this files can be used inside, or outside, the ActsExtractor
      class. Their purpose is to make some tasks easier for the user,
      like creating txts, searching through files, and print dataframes.
      Severity: Minor
      Found in dodfminer/extract/polished/helper.py - About 2 hrs to fix

        Function highlight_dataframe has a Cognitive Complexity of 18 (exceeds 5 allowed). Consider refactoring.
        Open

            def highlight_dataframe(self):
                if self._preds is None:
                    return
                self._data_frame = []
                for IOB, text in zip(self._preds, self._acts_str):
        Severity: Minor
        Found in dodfminer/extract/polished/acts/base.py - About 2 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_content has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring.
        Open

            def extract_content(self):
                """Extract Content from PDFs."""
                if self.args.single_file is None:
                    if self.args.type_of_extr is not None:
                        if self.args.type_of_extr == 'pure-text':
        Severity: Minor
        Found in dodfminer/run.py - About 2 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function get_features has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring.
        Open

          def get_features(self, sentence):
            sent_features = []
            for i in range(len(sentence)):
              word_feat = {
                # Palavra atual
        Severity: Minor
        Found in dodfminer/extract/polished/backend/pipeline.py - About 2 hrs to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function _create_passage has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
        Open

            def _create_passage(self, offset, text, an_dict):
                root_passage = etree.Element('passage')
        
                child_offset = etree.Element('offset')
                child_offset.text = str(offset)
        Severity: Minor
        Found in dodfminer/extract/polished/create_xml.py - About 1 hr to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_multiple_acts_with_committee has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_multiple_acts_with_committee(path, types, backend):
            """Extract multple Acts from Multiple DODFs to act named CSVs.
            Uses committee_classification to find act types.
        
            Args:
        Severity: Minor
        Found in dodfminer/extract/polished/helper.py - About 1 hr to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function extract_multiple_acts_parallel has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
        Open

        def extract_multiple_acts_parallel(path: str, types: List[str], backend: str, processes = 4):
            """Extract multple Acts from Multiple DODFs to act named CSVs in parallel.
        
            Args:
                path (str): Folder where the Dodfs are.
        Severity: Minor
        Found in dodfminer/extract/polished/helper.py - About 1 hr to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Function _predictions_dict has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
        Open

            def _predictions_dict(self, sentence, prediction):
                """Create dictionary of proprieties.
        
                Create dictionary of tags to save predicted entities.
        
        
        Severity: Minor
        Found in dodfminer/extract/polished/backend/ner.py - About 1 hr to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Identical blocks of code found in 2 locations. Consider refactoring.
        Open

              if self.useDefault:
                text_split = nltk.word_tokenize(text)
              else:
                text_split = self.pipeline['pre-processing'].transform([text])[0]
        Severity: Major
        Found in dodfminer/extract/polished/acts/base_contratos.py and 1 other location - About 1 hr to fix
        dodfminer/extract/polished/acts/base_contratos.py on lines 90..93

        Duplicated Code

        Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

        Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

        When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

        Tuning

        This issue has a mass of 45.

        We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

        The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

        If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

        See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

        Refactorings

        Further Reading

        Function _get_dodfs has a Cognitive Complexity of 13 (exceeds 5 allowed). Consider refactoring.
        Open

            def _get_dodfs(self, _links_for_each_dodf, month_path):
                """Create folder and stores the DODFs pdfs.
        
                Args:
                    _links_for_each_dodf (dict): a dicts with links for each DODF.
        Severity: Minor
        Found in dodfminer/downloader/core.py - About 1 hr to fix

        Cognitive Complexity

        Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

        A method's cognitive complexity is based on a few simple rules:

        • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
        • Code is considered more complex for each "break in the linear flow of the code"
        • Code is considered more complex when "flow breaking structures are nested"

        Further reading

        Severity
        Category
        Status
        Source
        Language