File tokenizer.py
has 474 lines of code (exceeds 250 allowed). Consider refactoring. Open
# Copyright 2024 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Function _count_tokens
has a Cognitive Complexity of 20 (exceeds 5 allowed). Consider refactoring. Open
def _count_tokens(files,
file_byte_limit=1e6,
correct_strip=True,
master_char_set=None):
"""Return token counts of words in the files.
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function _gen_new_subtoken_list
has a Cognitive Complexity of 14 (exceeds 5 allowed). Consider refactoring. Open
def _gen_new_subtoken_list(subtoken_counts,
min_count,
alphabet,
reserved_tokens=None):
"""Generate candidate subtokens ordered by count, and new max subtoken length.
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function _generate_subtokens_with_target_vocab_size
has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring. Open
def _generate_subtokens_with_target_vocab_size(token_counts,
alphabet,
target_size,
threshold,
min_count=None,
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function init_from_files
has 8 arguments (exceeds 4 allowed). Consider refactoring. Open
def init_from_files(vocab_file,
Avoid deeply nested control flow statements. Open
if file_byte_budget < 0:
break
if correct_strip:
Avoid deeply nested control flow statements. Open
if correct_strip:
line = native_to_unicode(line)
line = line.strip()
Avoid deeply nested control flow statements. Open
for token in _split_string_to_tokens(
native_to_unicode(line), master_char_set):
token_counts[token] += 1
return token_counts
Function _split_string_to_tokens
has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring. Open
def _split_string_to_tokens(text, master_char_set):
"""Splits text to a list of string tokens."""
if not text:
return []
ret = []
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function _generate_subtokens_with_target_vocab_size
has 6 arguments (exceeds 4 allowed). Consider refactoring. Open
def _generate_subtokens_with_target_vocab_size(token_counts,
Function _generate_subtokens
has 5 arguments (exceeds 4 allowed). Consider refactoring. Open
def _generate_subtokens(token_counts,
Function _split_token_to_subtokens
has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring. Open
def _split_token_to_subtokens(token, subtoken_dict, max_subtoken_length):
"""Splits a token into subtokens defined in the subtoken dict."""
ret = []
start = 0
token_len = len(token)
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function _unescape_token
has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring. Open
def _unescape_token(token):
r"""Replaces escaped characters in the token with their unescaped versions.
Applies inverse transformations as _escape_token():
1. Replace "\u" with "_", and "\\" with "\".
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"
Further reading
Function _count_and_gen_subtokens
has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring. Open
def _count_and_gen_subtokens(token_counts, alphabet, subtoken_dict,
max_subtoken_length):
"""Count number of times subtokens appear, and generate new subtokens.
Args:
- Read upRead up
Cognitive Complexity
Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.
A method's cognitive complexity is based on a few simple rules:
- Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
- Code is considered more complex for each "break in the linear flow of the code"
- Code is considered more complex when "flow breaking structures are nested"