IlyaGusev/rulm

View on GitHub

Showing 204 of 260 total issues

Function main has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def main(existing_path, output_path, langdetect_threshold: float = 0.8, sim_threshold: float = 0.93):
    lang_detector = FasttextLanguageDetector()
    embedder = Embedder("intfloat/multilingual-e5-base")

    existing_quries = list()
Severity: Minor
Found in self_instruct/src/data_processing/fetch_new_queries.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function predict_muserc has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def predict_muserc(
    split,
    predict_func,
    output_path,
    batch_size: int = 2,
Severity: Minor
Found in self_instruct/src/benchmarks/eval_zs_rsg.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function predict_rucos has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def predict_rucos(
    split,
    predict_func,
    output_path,
    batch_size: int = 4,
Severity: Minor
Found in self_instruct/src/benchmarks/eval_zs_rsg.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function convert_rsg has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def convert_rsg(split, output_path, tasks: List[str] = ALL_TASKS, use_short: bool = True):
    functions = []
    if "danetqa" in tasks:
        functions.append(get_danetqa(split))
    if "muserc" in tasks:
Severity: Minor
Found in self_instruct/src/data_processing/convert_rsg.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function parse_chat has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def parse_chat(result):
    try:
        chat = json.loads(result)
    except Exception:
        print("Incorrect JSON:", result)
Severity: Minor
Found in self_instruct/src/data_processing/generate_char_chats.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function undup_alpaca has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def undup_alpaca(alpaca_records, num_perm: int = 32, threshold: float = 0.3, debug: bool = False):
    for record in tqdm(alpaca_records, desc="Fingerprinting"):
        record["minhash"] = calc_fingerprint(record["messages"][0]["content"], num_perm=num_perm)

    lsh = MinHashLSH(
Severity: Minor
Found in self_instruct/src/data_processing/create_chat_set.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function main has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
Open

def main(
    token,
    agg_output,
    raw_output,
    pools_file,
Severity: Minor
Found in self_instruct/crowd/aggregate.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function __init__ has 14 arguments (exceeds 4 allowed). Consider refactoring.
Open

    def __init__(
Severity: Major
Found in data_processing/util.py - About 1 hr to fix

    Function generate_answers has a Cognitive Complexity of 13 (exceeds 5 allowed). Consider refactoring.
    Open

    def generate_answers(
        model_name: str,
        template_path: str,
        input_path: str,
        output_path: str,
    Severity: Minor
    Found in self_instruct/src/infer_saiga.py - About 1 hr to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function convert_to_native has a Cognitive Complexity of 13 (exceeds 5 allowed). Consider refactoring.
    Open

    def convert_to_native(
        model_name: str,
        output_path: str,
        device: str = "cpu",
        enable_offloading: bool = False
    Severity: Minor
    Found in self_instruct/src/tools/convert_to_native.py - About 1 hr to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function train has a Cognitive Complexity of 13 (exceeds 5 allowed). Consider refactoring.
    Open

    def train(
        config_file: str,
        train_file: str,
        val_file: str,
        output_dir: str,
    Severity: Minor
    Found in self_instruct/src/train.py - About 1 hr to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function main has a Cognitive Complexity of 13 (exceeds 5 allowed). Consider refactoring.
    Open

    def main(input_path, output_path):
        processor = TextProcessor(
            normalization="NFKC",
            min_chars=0,
            min_text_part=0,
    Severity: Minor
    Found in data_processing/clean_ficbook.py - About 1 hr to fix

    Cognitive Complexity

    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

    A method's cognitive complexity is based on a few simple rules:

    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
    • Code is considered more complex for each "break in the linear flow of the code"
    • Code is considered more complex when "flow breaking structures are nested"

    Further reading

    Function train has 12 arguments (exceeds 4 allowed). Consider refactoring.
    Open

    def train(
    Severity: Major
    Found in rulm/train.py - About 1 hr to fix

      Function undup_by_ngrams has a Cognitive Complexity of 12 (exceeds 5 allowed). Consider refactoring.
      Open

      def undup_by_ngrams(records, n: int = 8):
          existing_ngrams = dict()
          new_records = []
          records.sort(key=lambda x: x["opus_score"])
          for r in records:
      Severity: Minor
      Found in self_instruct/src/data_processing/to_parquet.py - About 1 hr to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function compose_sft_dataset has a Cognitive Complexity of 12 (exceeds 5 allowed). Consider refactoring.
      Open

      def compose_sft_dataset(config_path: str, train_path: str, val_path: str):
          with open(config_path) as r:
              config = json.load(r)
      
          records = []
      Severity: Minor
      Found in self_instruct/src/data_processing/compose_sft_dataset.py - About 1 hr to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function generate_answers has a Cognitive Complexity of 12 (exceeds 5 allowed). Consider refactoring.
      Open

      def generate_answers(
          model_name: str,
          input_path: str,
          output_path: str,
          batch_size: int = 1
      Severity: Minor
      Found in self_instruct/src/infer_fred.py - About 1 hr to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function dump_librusec has a Cognitive Complexity of 12 (exceeds 5 allowed). Consider refactoring.
      Open

      def dump_librusec(archive, sample_rate=0.15):
          max_sentences_count = 100
          text_processor = TextProcessor()
          librusec = load_dataset("IlyaGusev/librusec", split="train", streaming=True)
          for row in tqdm(librusec):
      Severity: Minor
      Found in data_processing/save_hf.py - About 1 hr to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function parse_post has a Cognitive Complexity of 12 (exceeds 5 allowed). Consider refactoring.
      Open

      def parse_post(post_id):
          api_url = "https://habr.com/kek/v2/articles/{}".format(post_id)
          post_url = "https://habr.com/ru/post/{}/".format(post_id)
      
          try:
      Severity: Minor
      Found in data_processing/create_habr.py - About 1 hr to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      Function process_batch has 10 arguments (exceeds 4 allowed). Consider refactoring.
      Open

      def process_batch(
      Severity: Major
      Found in self_instruct/src/data_processing/improve_instructions.py - About 1 hr to fix

        Function __init__ has 10 arguments (exceeds 4 allowed). Consider refactoring.
        Open

            def __init__(
        Severity: Major
        Found in self_instruct/src/dataset.py - About 1 hr to fix
          Severity
          Category
          Status
          Source
          Language