IlyaGusev/rulm

View on GitHub
self_instruct/src/data_processing/create_chat_set.py

Summary

Maintainability
D
2 days
Test Coverage

Function main has a Cognitive Complexity of 66 (exceeds 5 allowed). Consider refactoring.
Open

def main(train_path, val_path):
    random.seed(42)

    instruct_records = []
    for row in tqdm(load_dataset("lksy/ru_instruct_gpt4", split="train")):
Severity: Minor
Found in self_instruct/src/data_processing/create_chat_set.py - About 1 day to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

File create_chat_set.py has 264 lines of code (exceeds 250 allowed). Consider refactoring.
Open

import json
import sys
import re
import random
from itertools import tee
Severity: Minor
Found in self_instruct/src/data_processing/create_chat_set.py - About 2 hrs to fix

    Function main has 50 lines of code (exceeds 25 allowed). Consider refactoring.
    Open

    def main(train_path, val_path):
        random.seed(42)
    
        instruct_records = []
        for row in tqdm(load_dataset("lksy/ru_instruct_gpt4", split="train")):
    Severity: Minor
    Found in self_instruct/src/data_processing/create_chat_set.py - About 2 hrs to fix

      Function undup_alpaca has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring.
      Open

      def undup_alpaca(alpaca_records, num_perm: int = 32, threshold: float = 0.3, debug: bool = False):
          for record in tqdm(alpaca_records, desc="Fingerprinting"):
              record["minhash"] = calc_fingerprint(record["messages"][0]["content"], num_perm=num_perm)
      
          lsh = MinHashLSH(
      Severity: Minor
      Found in self_instruct/src/data_processing/create_chat_set.py - About 1 hr to fix

      Cognitive Complexity

      Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

      A method's cognitive complexity is based on a few simple rules:

      • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
      • Code is considered more complex for each "break in the linear flow of the code"
      • Code is considered more complex when "flow breaking structures are nested"

      Further reading

      There are no issues that match your filters.

      Category
      Status