IlyaGusev/rulm

View on GitHub

Showing 260 of 260 total issues

Function load_saiga has a Cognitive Complexity of 11 (exceeds 5 allowed). Consider refactoring.
Open

def load_saiga(
    model_name: str,
    use_8bit: bool = False,
    use_4bit: bool = False,
    torch_compile: bool = False,
Severity: Minor
Found in self_instruct/src/util/load.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Function convert_record has a Cognitive Complexity of 11 (exceeds 5 allowed). Consider refactoring.
Open

    def convert_record(self, record):
        conversation = Conversation.from_template(self.templates_path)
        conversation.expand(record["messages"])

        input_ids, labels = [], []
Severity: Minor
Found in self_instruct/src/dataset.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

TEXT_PROCESSOR = TextProcessor(
Severity: Major
Found in data_processing/create_habr.py and 2 other locations - About 1 hr to fix
data_processing/convert_pikabu.py on lines 10..10
data_processing/create_librusec.py on lines 12..12

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 41.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

                author = self.users[int(attribs["UserId"])] if attribs["UserId"] else attribs["UserDisplayName"]
Severity: Major
Found in data_processing/create_stackoverflow.py and 1 other location - About 1 hr to fix
data_processing/create_stackoverflow.py on lines 242..242

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 41.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function parse_comments has a Cognitive Complexity of 11 (exceeds 5 allowed). Consider refactoring.
Open

def parse_comments(post_id):
    api_url = "https://habr.com/kek/v2/articles/{}/comments".format(post_id)
    try:
        r = requests.get(api_url)
        if r.status_code == 503:
Severity: Minor
Found in data_processing/create_habr.py - About 1 hr to fix

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

  • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
  • Code is considered more complex for each "break in the linear flow of the code"
  • Code is considered more complex when "flow breaking structures are nested"

Further reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

        author = self.users[int(parent["OwnerUserId"])] if parent["OwnerUserId"] else parent["OwnerDisplayName"]
Severity: Major
Found in data_processing/create_stackoverflow.py and 1 other location - About 1 hr to fix
data_processing/create_stackoverflow.py on lines 154..154

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 41.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

TEXT_PROCESSOR = TextProcessor(
Severity: Major
Found in data_processing/create_librusec.py and 2 other locations - About 1 hr to fix
data_processing/convert_pikabu.py on lines 10..10
data_processing/create_habr.py on lines 15..15

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 41.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

TEXT_PROCESSOR = TextProcessor(
Severity: Major
Found in data_processing/convert_pikabu.py and 2 other locations - About 1 hr to fix
data_processing/create_habr.py on lines 15..15
data_processing/create_librusec.py on lines 12..12

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 41.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Function aggregate has 31 lines of code (exceeds 25 allowed). Consider refactoring.
Open

def aggregate(records, overlap=5, min_agreement=0.0):
    results = defaultdict(list)
    records.sort(key=lambda x: x["assignment_id"])
    for r in records:
        results[get_key(r)].append(r["result"])
Severity: Minor
Found in self_instruct/crowd/aggregate.py - About 1 hr to fix

    Function infer has 9 arguments (exceeds 4 allowed). Consider refactoring.
    Open

    def infer(
    Severity: Major
    Found in self_instruct/src/infer_saiga_llamacpp.py - About 1 hr to fix

      Function train has 9 arguments (exceeds 4 allowed). Consider refactoring.
      Open

      def train(
      Severity: Major
      Found in self_instruct/src/train.py - About 1 hr to fix

        Function infer_saiga_vllm has 9 arguments (exceeds 4 allowed). Consider refactoring.
        Open

        def infer_saiga_vllm(
        Severity: Major
        Found in self_instruct/src/infer_saiga_vllm.py - About 1 hr to fix

          Function evolve_batch has 9 arguments (exceeds 4 allowed). Consider refactoring.
          Open

          def evolve_batch(
          Severity: Major
          Found in self_instruct/src/data_processing/improve_instructions.py - About 1 hr to fix

            Function generate_answers has 9 arguments (exceeds 4 allowed). Consider refactoring.
            Open

            def generate_answers(
            Severity: Major
            Found in self_instruct/src/infer_saiga.py - About 1 hr to fix

              Function check_complete has 28 lines of code (exceeds 25 allowed). Consider refactoring.
              Open

                  def check_complete(self, a_attribs):
                      assert a_attribs is not None
                      parent_id = a_attribs["ParentId"]
                      parent = self.questions[parent_id]
                      if parent is None:
              Severity: Minor
              Found in data_processing/create_stackoverflow.py - About 1 hr to fix

                Function main has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
                Open

                def main(
                    seeds_path,
                    output_path,
                    template_path,
                    model_name="gpt-3.5-turbo",
                Severity: Minor
                Found in self_instruct/src/data_processing/generate_chat.py - About 1 hr to fix

                Cognitive Complexity

                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                A method's cognitive complexity is based on a few simple rules:

                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                • Code is considered more complex for each "break in the linear flow of the code"
                • Code is considered more complex when "flow breaking structures are nested"

                Further reading

                Function main has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
                Open

                def main(
                    input_path,
                    output_path,
                    template_path,
                    model_name="gpt-3.5-turbo",
                Severity: Minor
                Found in self_instruct/src/data_processing/exec_instructions.py - About 1 hr to fix

                Cognitive Complexity

                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                A method's cognitive complexity is based on a few simple rules:

                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                • Code is considered more complex for each "break in the linear flow of the code"
                • Code is considered more complex when "flow breaking structures are nested"

                Further reading

                Function _save_checkpoint has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
                Open

                    def _save_checkpoint(self, model, trial, metrics=None):
                        print("Running custom _save_checkpoint")
                        checkpoint_folder = f"{PREFIX_CHECKPOINT_DIR}-{self.state.global_step}"
                        run_dir = self._get_output_dir(trial=trial)
                        output_dir = os.path.join(run_dir, checkpoint_folder)
                Severity: Minor
                Found in self_instruct/src/train.py - About 1 hr to fix

                Cognitive Complexity

                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                A method's cognitive complexity is based on a few simple rules:

                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                • Code is considered more complex for each "break in the linear flow of the code"
                • Code is considered more complex when "flow breaking structures are nested"

                Further reading

                Function fix_output_records has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
                Open

                def fix_output_records(records):
                    for char in records:
                        unique_dialogues = dict()
                        topics = char["topics"]
                        if "dialogues" in char:
                Severity: Minor
                Found in self_instruct/src/data_processing/generate_char_chats.py - About 1 hr to fix

                Cognitive Complexity

                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                A method's cognitive complexity is based on a few simple rules:

                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                • Code is considered more complex for each "break in the linear flow of the code"
                • Code is considered more complex when "flow breaking structures are nested"

                Further reading

                Function improve_instructions has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
                Open

                def improve_instructions(
                    original_tasks_path,
                    output_path,
                    depth_template_path,
                    depth_methods_path,
                Severity: Minor
                Found in self_instruct/src/data_processing/improve_instructions.py - About 1 hr to fix

                Cognitive Complexity

                Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

                A method's cognitive complexity is based on a few simple rules:

                • Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
                • Code is considered more complex for each "break in the linear flow of the code"
                • Code is considered more complex when "flow breaking structures are nested"

                Further reading

                Severity
                Category
                Status
                Source
                Language