whylabs/whylogs-python

python/whylogs/api/pyspark/experimental/segmented_profiler.py

Summary

Maintainability

A

1 hr

Test Coverage

Function `collect_segmented_results` has a Cognitive Complexity of 9 (exceeds 5 allowed). Consider refactoring.
Open

def collect_segmented_results(
    input_df: SparkDataFrame,
    schema: DatasetSchema,
    dataset_timestamp: Optional[datetime] = None,
    creation_timestamp: Optional[datetime] = None,

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py - About 55 mins to fix

Read up
Read up

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
Code is considered more complex for each "break in the linear flow of the code"
Code is considered more complex when "flow breaking structures are nested"

Further reading

Function `whylogs_pandas_segmented_profiler` has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
Open

def whylogs_pandas_segmented_profiler(
    pdf_iterator: Iterable[pd.DataFrame], schema: Optional[DatasetSchema] = None
) -> Iterable[pd.DataFrame]:
    if schema is None or not schema.segments:
        raise ValueError(

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py - About 35 mins to fix

Read up
Read up

Cognitive Complexity

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and comprehend.

A method's cognitive complexity is based on a few simple rules:

Code is not considered more complex when it uses shorthand that the language provides for collapsing multiple statements into one
Code is considered more complex for each "break in the linear flow of the code"
Code is considered more complex when "flow breaking structures are nested"

Further reading

TODO found
Open

        # TODO: optimize this so we split the dataframe by segment first rather than split in the pandas dataframe

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by fixme

Exclude checks
- Disable engine
- Disable check

Line too long (101 > 79 characters)
Open

        raise ValueError("Segment key is missing required key values or parent_id: {segment_params}")

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (88 > 79 characters)
Open

    segmented_column_profile_views: Dict[Segment, Dict[str, ColumnProfileView]] = dict()

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (116 > 79 characters)
Open

        segmented_column_profile_views[segment_key][column_name] = collected_segment_column_profile_views[key_tuple]

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (104 > 79 characters)
Open

    segment_column_views_dict = collect_segmented_column_profile_views(input_df=input_df, schema=schema)

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (80 > 79 characters)
Open

    pdf_iterator: Iterable[pd.DataFrame], schema: Optional[DatasetSchema] = None

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (87 > 79 characters)
Open

from whylogs.api.pyspark.experimental.profiler import COL_NAME_FIELD, COL_PROFILE_FIELD

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (117 > 79 characters)
Open

            "Cannot profile segments without segmentation defined in the specified DatasetSchema: no segments found."

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (110 > 79 characters)
Open

    logger.warning(f"Skipping segment: could not find partition with id {partition_id} in: {schema.segments}")

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (82 > 79 characters)
Open

    logger.info(f"Processing segmented profiling in spark with {schema.segments}")

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (101 > 79 characters)
Open

            for col_name, col_profile in segmented_results.view(segmented_key).get_columns().items():

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (106 > 79 characters)
Open

def column_profile_bytes_aggregator(group_by_cols: Tuple[str], profiles_df: pd.DataFrame) -> pd.DataFrame:

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (115 > 79 characters)
Open

        col_name for col_name in input_df.schema.names if isinstance(input_df.schema[col_name].dataType, VectorUDT)

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (93 > 79 characters)
Open

        res_df = pd.DataFrame(columns=[SEGMENT_KEY_FIELD, COL_NAME_FIELD, COL_PROFILE_FIELD])

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (87 > 79 characters)
Open

                    COL_PROFILE_FIELD: [col_profile.to_protobuf().SerializeToString()],

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (106 > 79 characters)
Open

        (_string_to_segment(row.segment_key), row.col_name): ColumnProfileView.from_bytes(row.col_profile)

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (103 > 79 characters)
Open

    whylogs_pandas_map_profiler_with_schema = partial(whylogs_pandas_segmented_profiler, schema=schema)

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (90 > 79 characters)
Open

            raise ValueError(f"Collision when collecting profiles for segment {segment}!")

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (103 > 79 characters)
Open

        lambda acc, x: acc.merge(x), profiles_df[COL_PROFILE_FIELD].apply(ColumnProfileView.from_bytes)

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (116 > 79 characters)
Open

        raise ValueError("Cannot collect segmented results without segments defined in the passed in DatasetSchema")

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (91 > 79 characters)
Open

    cp = f"{SEGMENT_KEY_FIELD} string, {COL_NAME_FIELD} string, {COL_PROFILE_FIELD} binary"

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (119 > 79 characters)
Open

                columns=column_views_dict, dataset_timestamp=_dataset_timestamp, creation_timestamp=_creation_timestamp

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (114 > 79 characters)
Open

        # TODO: optimize this so we split the dataframe by segment first rather than split in the pandas dataframe

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (106 > 79 characters)
Open

        input_df_arrays = input_df_arrays.withColumn(col_name, vector_to_array(input_df_arrays[col_name]))

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (128 > 79 characters)
Open

    segmented_profile_bytes_df = input_df_arrays.mapInPandas(whylogs_pandas_map_profiler_with_schema, schema=cp)  # type: ignore

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (92 > 79 characters)
Open

    collected_segment_column_profile_views: Dict[Tuple[Segment, str], ColumnProfileView] = {

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (120 > 79 characters)
Open

        logger.warning("Unable to load pyspark; install pyspark to get whylogs profiling support in spark environment.")

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (113 > 79 characters)
Open

def _lookup_segment_partition_by_id(schema: DatasetSchema, partition_id: str) -> Optional[SegmentationPartition]:

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

Line too long (160 > 79 characters)
Open

                f"Skipping segment: could not collect profiles for segment {segment} because the schema has no matching partition ids: {schema.segments.keys()}"

Found in python/whylogs/api/pyspark/experimental/segmented_profiler.py by pep8

Read up
Read up
Exclude checks
- Disable engine
- Disable check

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side.  The default wrapping on such
devices looks ugly.  Therefore, please limit all lines to a maximum
of 79 characters. For flowing long blocks of text (docstrings or
comments), limiting the length to 72 characters is recommended.

Reports error E501.

There are no issues that match your filters.