Unbabel/OpenKiwi

View on GitHub

Showing 161 of 264 total issues

Similar blocks of code found in 30 locations. Consider refactoring.
Open

    group.add_argument(
        '--predict-source',
        type=lambda x: bool(strtobool(x)),
Severity: Major
Found in kiwi/cli/models/quetch.py and 29 other locations - About 1 hr to fix
kiwi/cli/models/nuqe.py on lines 59..61
kiwi/cli/models/predictor_estimator.py on lines 144..146
kiwi/cli/models/predictor_estimator.py on lines 276..278
kiwi/cli/models/predictor_estimator.py on lines 285..287
kiwi/cli/models/predictor_estimator.py on lines 322..324
kiwi/cli/models/predictor_estimator.py on lines 330..332
kiwi/cli/models/predictor_estimator.py on lines 339..341
kiwi/cli/models/predictor_estimator.py on lines 348..350
kiwi/cli/models/predictor_estimator.py on lines 382..384
kiwi/cli/models/predictor_estimator.py on lines 391..393
kiwi/cli/models/predictor_estimator.py on lines 400..402
kiwi/cli/models/predictor_estimator.py on lines 415..417
kiwi/cli/models/predictor_estimator.py on lines 428..430
kiwi/cli/models/predictor_estimator.py on lines 437..439
kiwi/cli/models/predictor_estimator.py on lines 446..448
kiwi/cli/models/predictor_estimator.py on lines 484..486
kiwi/cli/models/predictor_estimator.py on lines 493..495
kiwi/cli/models/predictor_estimator.py on lines 501..503
kiwi/cli/models/predictor_estimator.py on lines 512..514
kiwi/cli/models/quetch.py on lines 129..131
kiwi/cli/models/quetch.py on lines 137..139
kiwi/cli/models/quetch.py on lines 153..155
kiwi/cli/models/quetch.py on lines 217..219
kiwi/cli/models/quetch.py on lines 226..228
kiwi/cli/models/quetch.py on lines 244..246
kiwi/cli/models/quetch.py on lines 295..297
kiwi/cli/opts.py on lines 126..128
kiwi/cli/pipelines/train.py on lines 86..88
kiwi/cli/pipelines/train.py on lines 109..111

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 39.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 30 locations. Consider refactoring.
Open

    group.add_argument(
        '--freeze-embeddings',
        type=lambda x: bool(strtobool(x)),
Severity: Major
Found in kiwi/cli/models/nuqe.py and 29 other locations - About 1 hr to fix
kiwi/cli/models/predictor_estimator.py on lines 144..146
kiwi/cli/models/predictor_estimator.py on lines 276..278
kiwi/cli/models/predictor_estimator.py on lines 285..287
kiwi/cli/models/predictor_estimator.py on lines 322..324
kiwi/cli/models/predictor_estimator.py on lines 330..332
kiwi/cli/models/predictor_estimator.py on lines 339..341
kiwi/cli/models/predictor_estimator.py on lines 348..350
kiwi/cli/models/predictor_estimator.py on lines 382..384
kiwi/cli/models/predictor_estimator.py on lines 391..393
kiwi/cli/models/predictor_estimator.py on lines 400..402
kiwi/cli/models/predictor_estimator.py on lines 415..417
kiwi/cli/models/predictor_estimator.py on lines 428..430
kiwi/cli/models/predictor_estimator.py on lines 437..439
kiwi/cli/models/predictor_estimator.py on lines 446..448
kiwi/cli/models/predictor_estimator.py on lines 484..486
kiwi/cli/models/predictor_estimator.py on lines 493..495
kiwi/cli/models/predictor_estimator.py on lines 501..503
kiwi/cli/models/predictor_estimator.py on lines 512..514
kiwi/cli/models/quetch.py on lines 129..131
kiwi/cli/models/quetch.py on lines 137..139
kiwi/cli/models/quetch.py on lines 145..147
kiwi/cli/models/quetch.py on lines 153..155
kiwi/cli/models/quetch.py on lines 217..219
kiwi/cli/models/quetch.py on lines 226..228
kiwi/cli/models/quetch.py on lines 244..246
kiwi/cli/models/quetch.py on lines 295..297
kiwi/cli/opts.py on lines 126..128
kiwi/cli/pipelines/train.py on lines 86..88
kiwi/cli/pipelines/train.py on lines 109..111

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 39.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

def build_text_field():
    return data.Field(
        tokenize=tokenizer,
        init_token=const.START,
        batch_first=True,
Severity: Major
Found in kiwi/data/fieldsets/predictor_estimator.py and 1 other location - About 1 hr to fix
kiwi/data/fieldsets/predictor.py on lines 26..33

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 38.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

def build_text_field():
    return data.Field(
        tokenize=tokenizer,
        init_token=const.START,
        batch_first=True,
Severity: Major
Found in kiwi/data/fieldsets/predictor.py and 1 other location - About 1 hr to fix
kiwi/data/fieldsets/predictor_estimator.py on lines 28..35

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 38.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

def main(argv=None):
    parser = build_parser()
    options = parser.parse(args=argv)
    train.train_from_options(options)
Severity: Major
Found in kiwi/cli/pipelines/train.py and 2 other locations - About 55 mins to fix
kiwi/cli/pipelines/evaluate.py on lines 152..155
kiwi/cli/pipelines/jackknife.py on lines 56..59

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 37.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

def main(argv=None):
    parser = build_parser()
    options = parser.parse(args=argv)
    evaluate.evaluate_from_options(options)
Severity: Major
Found in kiwi/cli/pipelines/evaluate.py and 2 other locations - About 55 mins to fix
kiwi/cli/pipelines/jackknife.py on lines 56..59
kiwi/cli/pipelines/train.py on lines 139..142

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 37.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

def main(argv=None):
    parser = build_parser()
    options = parser.parse(args=argv)
    jackknife.run_from_options(options)
Severity: Major
Found in kiwi/cli/pipelines/jackknife.py and 2 other locations - About 55 mins to fix
kiwi/cli/pipelines/evaluate.py on lines 152..155
kiwi/cli/pipelines/train.py on lines 139..142

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 37.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

    def summarize(self):
        summary = {self.metric_name: self.nll / self.tokens}
        return self._prefix_keys(summary)
Severity: Minor
Found in kiwi/metrics/metrics.py and 1 other location - About 50 mins to fix
kiwi/metrics/metrics.py on lines 245..247

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

        target_contexts = torch.cat(
            [forward_contexts[:, :-2], backward_contexts[:, 2:]], dim=-1
Severity: Major
Found in kiwi/models/predictor.py and 2 other locations - About 50 mins to fix
kiwi/models/predictor.py on lines 265..266
kiwi/models/predictor_estimator.py on lines 421..421

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

def load_vocabularies_to_datasets(vocab_path, *datasets):
    fields = {}
    for dataset in datasets:
        fields.update(dataset.fields)
    return load_vocabularies_to_fields(vocab_path, fields)
Severity: Minor
Found in kiwi/data/utils.py and 1 other location - About 50 mins to fix
kiwi/data/utils.py on lines 144..148

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

        target_embeddings = torch.cat(
            [target_embeddings[:, :-2], target_embeddings[:, 2:]], dim=-1
Severity: Major
Found in kiwi/models/predictor.py and 2 other locations - About 50 mins to fix
kiwi/models/predictor.py on lines 261..262
kiwi/models/predictor_estimator.py on lines 421..421

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

def save_vocabularies_from_datasets(directory, *datasets):
    fields = {}
    for dataset in datasets:
        fields.update(dataset.fields)
    return save_vocabularies_from_fields(directory, fields)
Severity: Minor
Found in kiwi/data/utils.py and 1 other location - About 50 mins to fix
kiwi/data/utils.py on lines 133..137

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

    def summarize(self):
        summary = {self.metric_name: self.expected_error / self.tokens}
        return self._prefix_keys(summary)
Severity: Minor
Found in kiwi/metrics/metrics.py and 1 other location - About 50 mins to fix
kiwi/metrics/metrics.py on lines 119..121

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

        metrics.append(
            ExpectedErrorMetric(
                prefix=self.config.target_side,
                target_name=self.config.target_side,
                PAD=const.PAD_ID,
Severity: Minor
Found in kiwi/models/predictor.py and 1 other location - About 50 mins to fix
kiwi/models/predictor.py on lines 343..348

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 2 locations. Consider refactoring.
Open

        metrics.append(
            CorrectMetric(
                prefix=self.config.target_side,
                target_name=self.config.target_side,
                PAD=const.PAD_ID,
Severity: Minor
Found in kiwi/models/predictor.py and 1 other location - About 50 mins to fix
kiwi/models/predictor.py on lines 351..356

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 3 locations. Consider refactoring.
Open

        contexts = torch.cat((contexts[:, :-1], contexts[:, 1:]), dim=-1)
Severity: Major
Found in kiwi/models/predictor_estimator.py and 2 other locations - About 50 mins to fix
kiwi/models/predictor.py on lines 261..262
kiwi/models/predictor.py on lines 265..266

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 36.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 5 locations. Consider refactoring.
Open

    fs.add(
        name=const.TARGET_NGRAM_LEFT,
        field=target_ngram_left,
        file_option_suffix='_target_ngram',
        file_reader=partial(Corpus.read_tabular_file, extract_column=1),
Severity: Major
Found in kiwi/data/fieldsets/linear.py and 4 other locations - About 45 mins to fix
kiwi/data/fieldsets/linear.py on lines 89..93
kiwi/data/fieldsets/linear.py on lines 99..103
kiwi/data/fieldsets/linear.py on lines 106..110
kiwi/data/fieldsets/linear.py on lines 123..127

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 35.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 5 locations. Consider refactoring.
Open

    fs.add(
        name=const.TARGET_NGRAM_RIGHT,
        field=target_ngram_right,
        file_option_suffix='_target_ngram',
        file_reader=partial(Corpus.read_tabular_file, extract_column=2),
Severity: Major
Found in kiwi/data/fieldsets/linear.py and 4 other locations - About 45 mins to fix
kiwi/data/fieldsets/linear.py on lines 89..93
kiwi/data/fieldsets/linear.py on lines 99..103
kiwi/data/fieldsets/linear.py on lines 106..110
kiwi/data/fieldsets/linear.py on lines 116..120

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 35.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Identical blocks of code found in 2 locations. Consider refactoring.
Open

            try:
                self.checkpointer(self, valid_iterator, step=self._step)
            except EarlyStopException as e:
                logger.info(e)
                break
Severity: Minor
Found in kiwi/trainers/trainer.py and 1 other location - About 45 mins to fix
kiwi/trainers/trainer.py on lines 121..125

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 35.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Similar blocks of code found in 5 locations. Consider refactoring.
Open

    fs.add(
        name=const.TARGET_STACKED,
        field=target_stacked,
        file_option_suffix='_target_stacked',
        file_reader=partial(Corpus.read_tabular_file, extract_column=1),
Severity: Major
Found in kiwi/data/fieldsets/linear.py and 4 other locations - About 45 mins to fix
kiwi/data/fieldsets/linear.py on lines 99..103
kiwi/data/fieldsets/linear.py on lines 106..110
kiwi/data/fieldsets/linear.py on lines 116..120
kiwi/data/fieldsets/linear.py on lines 123..127

Duplicated Code

Duplicated code can lead to software that is hard to understand and difficult to change. The Don't Repeat Yourself (DRY) principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

When you violate DRY, bugs and maintenance problems are sure to follow. Duplicated code has a tendency to both continue to replicate and also to diverge (leaving bugs as two similar implementations differ in subtle ways).

Tuning

This issue has a mass of 35.

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

See codeclimate-duplication's documentation for more information about tuning the mass threshold in your .codeclimate.yml.

Refactorings

Further Reading

Severity
Category
Status
Source
Language