IlyaGusev/rulm

View on GitHub

Showing 204 of 260 total issues

Function create_pairs has a Cognitive Complexity of 23 (exceeds 5 allowed). Consider refactoring.
Open

def create_pairs(config_path, output_path):
    with open(config_path) as r:
        config = json.load(r)
    files = config["files"]
    pairs_to_compare = config["pairs_to_compare"]
Severity: Minor
Found in self_instruct/src/sbs/create_pairs.py - About 3 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_chars has a Cognitive Complexity of 23 (exceeds 5 allowed). Consider refactoring.
Open

def generate_chars(
    output_path: str,
    seed_chars_path: str,
    template_path: str,
    num_chars_to_generate: int = 200,
Severity: Minor
Found in self_instruct/src/data_processing/generate_chars.py - About 3 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function __call__ has a Cognitive Complexity of 23 (exceeds 5 allowed). Consider refactoring.
Open

    def __call__(self):
        desc = "Parsing users XML file: {}".format(self.users_path)
        for event, elem in tqdm(etree.iterparse(self.users_path, events=('end',)), desc=desc):
            if elem.tag != "row":
                continue
Severity: Minor
Found in data_processing/create_stackoverflow.py - About 3 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function preprocess_text has a Cognitive Complexity of 23 (exceeds 5 allowed). Consider refactoring.
Open

def preprocess_text(text):
    for _ in range(10):
        text = text.replace("::", " ")

    text = TEXT_PROCESSOR(text)
Severity: Minor
Found in data_processing/convert_wiki.py - About 3 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_instructions has a Cognitive Complexity of 22 (exceeds 5 allowed). Consider refactoring.
Open

def generate_instructions(
    output_path: str,
    seed_tasks_path: str,
    settings_path: str,
    template_path: str,
Severity: Minor
Found in self_instruct/src/data_processing/generate_instructions.py - About 3 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function main has a Cognitive Complexity of 20 (exceeds 5 allowed). Consider refactoring.
Open

def main(output_path, langdetect_threshold: float = 0.8, sim_threshold: float = 0.93):
    lang_detector = FasttextLanguageDetector()
    embedder = Embedder("intfloat/multilingual-e5-base")

    existing_queries = list()
Severity: Minor
Found in self_instruct/src/data_processing/fetch_new_multiturn_queries.py - About 2 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function fix_tokenizer has a Cognitive Complexity of 20 (exceeds 5 allowed). Consider refactoring.
Open

def fix_tokenizer(tokenizer, model_config):
    bad_ids = (None, tokenizer.vocab_size)

    special_tokens = dict()
    guessed_pad_token_id = None
Severity: Minor
Found in self_instruct/src/util/dl.py - About 2 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function main has a Cognitive Complexity of 19 (exceeds 5 allowed). Consider refactoring.
Open

def main(
    output_path,
    news_output_path
):
    output_archive = PlainArchive(output_path)
Severity: Minor
Found in data_processing/save_mc4.py - About 2 hrs to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File create_stackoverflow.py has 269 lines of code (exceeds 250 allowed). Consider refactoring.
Open

# Based on https://github.com/EleutherAI/stackexchange-dataset/blob/master/pairer.py

import argparse
import os
import re
Severity: Minor
Found in data_processing/create_stackoverflow.py - About 2 hrs to fix

    Function fetch_tagengo has a Cognitive Complexity of 18 (exceeds 5 allowed). Consider refactoring.
    Open

    def fetch_tagengo():
        mapping = {
            "gpt": "bot",
            "human": "user"
        }
    Severity: Minor
    Found in self_instruct/src/data_processing/fetch_tagengo.py - About 2 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function translate_state_dict_key has a Cognitive Complexity of 18 (exceeds 5 allowed). Consider refactoring.
    Open

    def translate_state_dict_key(k):  # noqa: C901
        k = k.replace('base_model.model.', '')
        if k == 'model.embed_tokens.weight':
            return 'tok_embeddings.weight'
        elif k == 'model.norm.weight':
    Severity: Minor
    Found in self_instruct/src/tools/convert_to_native.py - About 2 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File create_chat_set.py has 264 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    import json
    import sys
    import re
    import random
    from itertools import tee
    Severity: Minor
    Found in self_instruct/src/data_processing/create_chat_set.py - About 2 hrs to fix

      Function infer_saiga_vllm has a Cognitive Complexity of 17 (exceeds 5 allowed). Consider refactoring.
      Open

      def infer_saiga_vllm(
          model_name: str,
          input_path: str,
          output_path: str,
          temperature: float = 0.6,
      Severity: Minor
      Found in self_instruct/src/infer_saiga_vllm.py - About 2 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function infer has a Cognitive Complexity of 16 (exceeds 5 allowed). Consider refactoring.
      Open

      def infer(
          model_name: str,
          input_path: str,
          output_path: str,
          n_ctx: int = 2000,
      Severity: Minor
      Found in self_instruct/src/infer_saiga_llamacpp.py - About 2 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function train has 50 lines of code (exceeds 25 allowed). Consider refactoring.
      Open

      def train(
          config_file: str,
          train_file: str,
          val_file: str,
          output_dir: str,
      Severity: Minor
      Found in self_instruct/src/train.py - About 2 hrs to fix

        Function main has 50 lines of code (exceeds 25 allowed). Consider refactoring.
        Open

        def main(train_path, val_path):
            random.seed(42)
        
            instruct_records = []
            for row in tqdm(load_dataset("lksy/ru_instruct_gpt4", split="train")):
        Severity: Minor
        Found in self_instruct/src/data_processing/create_chat_set.py - About 2 hrs to fix

          Function fetch_tagengo has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
          Open

          def fetch_tagengo():
              for row in chain(
                  load_dataset("allenai/WildChat", split="train")
              ):
                  language = row["language"]
          Severity: Minor
          Found in self_instruct/src/data_processing/fetch_wildchat.py - About 1 hr to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Function parse_section has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
          Open

              def parse_section(self, section):
                  # sectionType
                  # https://github.com/gribuser/fb2/blob/master/FictionBook.xsd#L396
                  title = section.find("./fb:title", NS)
                  title_str = self.parse_content(title) if title is not None else None
          Severity: Minor
          Found in data_processing/parse_fb2.py - About 1 hr to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Function improve_instructions has 15 arguments (exceeds 4 allowed). Consider refactoring.
          Open

          def improve_instructions(
          Severity: Major
          Found in self_instruct/src/data_processing/improve_instructions.py - About 1 hr to fix

            Function main has 44 lines of code (exceeds 25 allowed). Consider refactoring.
            Open

            def main(train_path, val_path):
                random.seed(42)
            
                instruct_records = []
                for row in tqdm(load_dataset("lksy/ru_instruct_gpt4", split="train")):
            Severity: Minor
            Found in self_instruct/src/data_processing/create_short_chat_set.py - About 1 hr to fix
              Severity
              Category
              Status
              Source
              Language