IlyaGusev/rulm

View on GitHub

Showing 260 of 260 total issues

File eval_zs_rsg.py has 698 lines of code (exceeds 250 allowed). Consider refactoring.
Open

from typing import Tuple, Callable
import re
import copy
from pathlib import Path
from tqdm import tqdm
Severity: Major
Found in self_instruct/src/benchmarks/eval_zs_rsg.py - About 1 day to fix

    Function main has a Cognitive Complexity of 66 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(train_path, val_path):
        random.seed(42)
    
        instruct_records = []
        for row in tqdm(load_dataset("lksy/ru_instruct_gpt4", split="train")):
    Severity: Minor
    Found in self_instruct/src/data_processing/create_chat_set.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 56 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(train_path, val_path):
        random.seed(42)
    
        instruct_records = []
        for row in tqdm(load_dataset("lksy/ru_instruct_gpt4", split="train")):
    Severity: Minor
    Found in self_instruct/src/data_processing/create_short_chat_set.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 53 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(
        input_path,
        output_path
    ):
        output_archive = PlainArchive(output_path)
    Severity: Minor
    Found in data_processing/create_stihi.py - About 1 day to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 49 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(
        output_path
    ):
        output_archive = PlainArchive(output_path)
        text_processor = TextProcessor(min_chars=200)
    Severity: Minor
    Found in data_processing/convert_mc4.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function clean_text has a Cognitive Complexity of 47 (exceeds 5 allowed). Consider refactoring.
    Open

    def clean_text(text, text_processor):
        text = text_processor(text)
        if not text:
            return
    
    
    Severity: Minor
    Found in data_processing/save_mc4.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function dump_pikabu has a Cognitive Complexity of 47 (exceeds 5 allowed). Consider refactoring.
    Open

    def dump_pikabu(archive):
        post_text_processor = TextProcessor(
            min_chars=50,
            min_text_part=0.7,
            fix_punct=False,
    Severity: Minor
    Found in data_processing/save_hf.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 46 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(
        buriy_files,
        fontanka_path,
        lenta_path,
        tass_path,
    Severity: Minor
    Found in data_processing/create_ru_news.py - About 7 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function get_fic has a Cognitive Complexity of 33 (exceeds 5 allowed). Consider refactoring.
    Open

    def get_fic(url, sleep_time):
        parsed_uri = urlparse(url)
        host = "{uri.scheme}://{uri.netloc}".format(uri=parsed_uri)
    
        try:
    Severity: Minor
    Found in data_processing/create_ficbook.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function post_process has a Cognitive Complexity of 32 (exceeds 5 allowed). Consider refactoring.
    Open

    def post_process(response, settings):
        if not response:
            return []
        raw_instructions = response["message"]["content"]
        if raw_instructions.count("###") < 2:
    Severity: Minor
    Found in self_instruct/src/data_processing/generate_instructions.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function convert_habr has a Cognitive Complexity of 32 (exceeds 5 allowed). Consider refactoring.
    Open

    def convert_habr(archive):
        habr = load_dataset('IlyaGusev/habr', split="train", streaming=True)
        for row in tqdm(habr):
            if row["language"] != "ru":
                continue
    Severity: Minor
    Found in data_processing/hf_to_instruct.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 30 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(
        chars_path,
        output_path,
        template_path,
        request_batch_size=4
    Severity: Minor
    Found in self_instruct/src/data_processing/generate_char_chats.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function process_annotations has a Cognitive Complexity of 30 (exceeds 5 allowed). Consider refactoring.
    Open

    def process_annotations(input_path, output_path):
        records = read_jsonl(input_path)
        keys = set()
        new_records = []
        counts = Counter()
    Severity: Minor
    Found in self_instruct/src/data_processing/process_annotations.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 28 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(input_dir, output_dir):
        parser = FB2Parser()
        processor = TextProcessor(
            min_chars=3,
            min_text_part=0.0,
    Severity: Minor
    Found in data_processing/parse_zip_fb2.py - About 4 hrs to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    File save_hf.py has 333 lines of code (exceeds 250 allowed). Consider refactoring.
    Open

    import argparse
    import json
    import random
    from collections import defaultdict
    
    
    Severity: Minor
    Found in data_processing/save_hf.py - About 4 hrs to fix

      Function dump_stackoverflow has a Cognitive Complexity of 27 (exceeds 5 allowed). Consider refactoring.
      Open

      def dump_stackoverflow(archive):
          text_processor = TextProcessor(
              min_chars=100,
              min_text_part=0.0,
              fix_punct=False,
      Severity: Minor
      Found in data_processing/save_hf.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function dump_habr has a Cognitive Complexity of 26 (exceeds 5 allowed). Consider refactoring.
      Open

      def dump_habr(archive):
          text_processor = TextProcessor()
          habr = load_dataset('IlyaGusev/habr', split="train", streaming=True)
          text_processor = TextProcessor(
              min_chars=100,
      Severity: Minor
      Found in data_processing/save_hf.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function main has a Cognitive Complexity of 25 (exceeds 5 allowed). Consider refactoring.
      Open

      def main(
          model_name,
          nrows: int = None,
          template_path: str = "internal_prompts/saiga_v2.json",
          split: str = "test",
      Severity: Minor
      Found in self_instruct/src/benchmarks/eval_zs_rsg.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function check_complete has a Cognitive Complexity of 25 (exceeds 5 allowed). Consider refactoring.
      Open

          def check_complete(self, a_attribs):
              assert a_attribs is not None
              parent_id = a_attribs["ParentId"]
              parent = self.questions[parent_id]
              if parent is None:
      Severity: Minor
      Found in data_processing/create_stackoverflow.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function generate_chars has a Cognitive Complexity of 23 (exceeds 5 allowed). Consider refactoring.
      Open

      def generate_chars(
          output_path: str,
          seed_chars_path: str,
          template_path: str,
          num_chars_to_generate: int = 200,
      Severity: Minor
      Found in self_instruct/src/data_processing/generate_chars.py - About 3 hrs to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Severity
      Category
      Status
      Source
      Language