IlyaGusev/rulm

View on GitHub

Showing 260 of 260 total issues

Similar blocks of code found in 3 locations. Consider refactoring.
Open

            question = {question_mapping[k]: v for k, v in record.items() if k in question_mapping}
Severity: Major
Found in data_processing/convert_yandex_q.py and 2 other locations - About 50 mins to fix
data_processing/convert_pikabu.py on lines 132..132
data_processing/convert_pikabu.py on lines 142..142

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

                            comment = {comments_mapping[k]: v for k, v in comment.items() if k in comments_mapping}
Severity: Major
Found in data_processing/convert_pikabu.py and 2 other locations - About 50 mins to fix
data_processing/convert_pikabu.py on lines 132..132
data_processing/convert_yandex_q.py on lines 65..65

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function custom_prepare_model_for_int8_training has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
Open

def custom_prepare_model_for_int8_training(
    model,
    output_embedding_layer_name="lm_head",
    layer_norm_names=["layer_norm"]
):
Severity: Minor
Found in self_instruct/src/train.py - About 45 mins to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function generate_answers has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
Open

def generate_answers(
    model_name: str,
    input_path: str,
    output_path: str,
    batch_size: int = 1,
Severity: Minor
Found in self_instruct/src/infer_llama3.py - About 45 mins to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function to_alpaca_eval has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
Open

def to_alpaca_eval(
    input_files: str,
    output_path: str,
):
    input_files = input_files.split(",")
Severity: Minor
Found in self_instruct/src/to_alpaca_eval.py - About 45 mins to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function interact has 6 arguments (exceeds 4 allowed). Consider refactoring.
Open

def interact(
Severity: Minor
Found in self_instruct/src/interact_mistral_llamacpp.py - About 45 mins to fix

    Avoid deeply nested control flow statements.
    Open

                        for dialogue in record["dialogues"]:
                            topic = dialogue["topic"]
                            existing_keys.add(get_dialogue_key(record, topic))
                output_records = {get_char_key(char): char for char in output_records}
    Severity: Major
    Found in self_instruct/src/data_processing/generate_char_chats.py - About 45 mins to fix

      Function interact has 6 arguments (exceeds 4 allowed). Consider refactoring.
      Open

      def interact(
      Severity: Minor
      Found in self_instruct/src/interact_llama3_llamacpp.py - About 45 mins to fix

        Function anthropic_completion has 6 arguments (exceeds 4 allowed). Consider refactoring.
        Open

        def anthropic_completion(
        Severity: Minor
        Found in self_instruct/src/anthropic_wrapper.py - About 45 mins to fix

          Function main has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
          Open

          def main(input_path, output_path):
              with open(input_path, "r") as r, open(output_path, "w") as w:
                  def flush(text_id, fragments):
                      text = " ".join(fragments)
                      text = preprocess_text(text, text_id)
          Severity: Minor
          Found in data_processing/create_librusec.py - About 45 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Function get_parus has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
          Open

          def get_parus(split):
              dataset = load_dataset(HF_DATASET, "parus", split=split)
              for row in dataset:
                  is_cause = row["question"] == "cause"
                  c1 = row["choice1"].rstrip(".").lower()
          Severity: Minor
          Found in self_instruct/src/data_processing/convert_rsg.py - About 45 mins to fix

          Cognitive Complexity

          Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

          A method's cognitive complexity is based on a few simple rules:

          • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
          • Code is considered more complex for each "break in the linear flow of the code"
          • Code is considered more complex when "flow breaking structures are nested"

          Further reading

          Avoid deeply nested control flow statements.
          Open

                          if current_agent and current_message:
                              if current_agent != "bot":
                                  is_bad_record = True
                                  break
                              messages.append({
          Severity: Major
          Found in self_instruct/src/data_processing/postprocess_chat.py - About 45 mins to fix

            Function interact has 6 arguments (exceeds 4 allowed). Consider refactoring.
            Open

            def interact(
            Severity: Minor
            Found in self_instruct/src/interact_llamacpp.py - About 45 mins to fix

              Function predict has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
              Open

              def predict(k_shots: pd.DataFrame, test_data: pd.DataFrame, task_name: str, predict_func, batch_size):
                  if task_name in ['ru_worldtree', 'ru_openbook']:
                      k_shots_pairs = [(OPENBOOK_PROMPT, "B")]
                      for row in k_shots.to_dict(orient="records"):
                          question = row["question"]
              Severity: Minor
              Found in self_instruct/src/benchmarks/eval_zs_tape.py - About 45 mins to fix

              Cognitive Complexity

              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

              A method's cognitive complexity is based on a few simple rules:

              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
              • Code is considered more complex for each "break in the linear flow of the code"
              • Code is considered more complex when "flow breaking structures are nested"

              Further reading

              Function predict_danetqa has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
              Open

              def predict_danetqa(
                  split,
                  predict_func,
                  output_path,
                  batch_size: int = 4,
              Severity: Minor
              Found in self_instruct/src/benchmarks/eval_zs_rsg.py - About 45 mins to fix

              Cognitive Complexity

              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

              A method's cognitive complexity is based on a few simple rules:

              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
              • Code is considered more complex for each "break in the linear flow of the code"
              • Code is considered more complex when "flow breaking structures are nested"

              Further reading

              Function get_russe has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
              Open

              def get_russe(split, sample_rate: float = 0.1):
                  dataset = load_dataset(HF_DATASET, "russe", split=split)
                  for row in dataset:
                      if split != "test" and random.random() > sample_rate:
                          continue
              Severity: Minor
              Found in self_instruct/src/data_processing/convert_rsg.py - About 45 mins to fix

              Cognitive Complexity

              Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

              A method's cognitive complexity is based on a few simple rules:

              • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
              • Code is considered more complex for each "break in the linear flow of the code"
              • Code is considered more complex when "flow breaking structures are nested"

              Further reading

              Avoid deeply nested control flow statements.
              Open

                              if current_agent and current_message:
                                  if current_agent != "user":
                                      is_bad_record = True
                                      break
                                  messages.append({
              Severity: Major
              Found in self_instruct/src/data_processing/postprocess_chat.py - About 45 mins to fix

                Function generate has 6 arguments (exceeds 4 allowed). Consider refactoring.
                Open

                def generate(
                Severity: Minor
                Found in self_instruct/src/util/generate.py - About 45 mins to fix

                  Function predict_lidirus has 6 arguments (exceeds 4 allowed). Consider refactoring.
                  Open

                  def predict_lidirus(
                  Severity: Minor
                  Found in self_instruct/src/benchmarks/eval_zs_rsg.py - About 45 mins to fix

                    Function get_pool has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.
                    Open

                    def get_pool(pool_id, toloka_client):
                        records = []
                        for assignment in toloka_client.get_assignments(pool_id=pool_id):
                            solutions = assignment.solutions
                            if not solutions:
                    Severity: Minor
                    Found in self_instruct/crowd/aggregate.py - About 45 mins to fix

                    Cognitive Complexity

                    Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                    A method's cognitive complexity is based on a few simple rules:

                    • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                    • Code is considered more complex for each "break in the linear flow of the code"
                    • Code is considered more complex when "flow breaking structures are nested"

                    Further reading

                    Severity
                    Category
                    Status
                    Source
                    Language