heronshoes/red_amber

View on GitHub
doc/DataFrame_Comparison.md

Summary

Maintainability
Test Coverage
# Comparison of DataFrames

Compare basic features of RedAmber with [Python pandas](https://pandas.pydata.org/),
[R Tidyverse](https://www.tidyverse.org/) and
[Julia DataFrames](https://dataframes.juliadata.org/stable/).

## Select columns (variables)

| Features                        |    RedAmber        |    Tidyverse                          | pandas                                 | DataFrames.jl     |
|---                              |---              |---                              |---                                     |---                |
| Select columns as a dataframe   |    pick, drop, []     | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select        |
| Select a column as a vector     |     [], v             | dplyr::pull, [, x]                 | [], loc[], iloc[]                      | [!, :x]           |
| Move columns to a new position  |    pick, []           | relocate                        | [], reindex, loc[], iloc[]             | select,transform  |

## Select rows (records, observations)

| Features                                              |    RedAmber             | Tidyverse                   | pandas                   | DataFrames.jl |
|---                                                    |---                |---                          |---                       |---            |
| Select rows that meet logical criteria as a dataframe |    slice, remove, [] |     dplyr::filter              |    [], filter, query, loc[] | filter        |
| Select rows by position as a dataframe                   | slice, remove, [] | dplyr::slice                   | iloc[], drop             | subset        |
| Move rows to a new position                              | slice, []            | dplyr::filter, dplyr::slice |    reindex, loc[], iloc[]   | permute       |

## Update columns / create new columns

|Features                              | RedAmber               | Tidyverse                                             | pandas            | DataFrames.jl |
|---                                |---                  |---                                                 |---                |---            |
| Update existing columns           |    assign                 | dplyr::mutate                                         | assign, []=       | mapcols       |
| Create new columns                   | assign, assign_left |    dplyr::mutate                                         | apply             | insertcols,.+ |
| Compute new columns, drop others     | new                    | transmute                                             | (dfply:)transmute | transform,insertcols,mapcols |
| Rename columns                       | rename              |    dplyr::rename, dplyr::rename_with, purrr::set_names |    rename, set_axis  | rename        |
| Sort dataframe                       | sort                   | dplyr::arrange                                         | sort_values       | sort          |

## Reshape dataframe

| Features                                                | RedAmber     | Tidyverse              | pandas       | DataFrames.jl |
|---                                                   |---        |---                  |---           |---            |
| Gather columns into rows (create a longer dataframe) |    to_long      | tidyr::pivot_longer |    melt         | stack         |
| Spread rows into columns (create a wider dataframe)  | to_wide      | tidyr::pivot_wider     | pivot        | unstack       |
| transpose a wide dataframe                              | transpose | transpose, t           | transpose, T | permutedims   |

## Grouping

| Features | RedAmber                   | Tidyverse                               | pandas       | DataFrames.jl   |
|---       |---                      |---                                   |---           |---              |
|Grouping     | group, group.summarize     | dplyr::group_by %>% dplyr::summarise | groupby.agg  | combine,groupby |

## Combine dataframes or tables

| Features                                    |  RedAmber                         | Tidyverse          | pandas  | DataFrames.jl |
|---                                       |---                             |---                 |---      |---            |
| Combine additional columns               | merge, bind_cols               | dplyr::bind_cols   | concat  | combine       |
| Combine additional rows                     | concatenate, concat, bind_rows |    dplyr::bind_rows      | concat  | transform     |
| Join right to left, leaving only the matching rows| inner_join, join      | dplyr::inner_join  | merge   | innerjoin     |
| Join right to left, leaving all rows     | full_join, outer_join, join       | dplyr::full_join   | merge   | outerjoin     |
| Join matching values to left from right  | left_join, join                |    dplyr::left_join      | merge   | leftjoin      |
| Join matching values from left to right  | right_join, join               |    dplyr::right_join  | merge   | rightjoin     |
| Return rows of left that have a match in right | semi_join, join             | dplyr::semi_join      | [isin]  | semijoin      |
| Return rows of left that do not have a match in right | anti_join, join   |    dplyr::anti_join      | [isin]  | antijoin      |
| Collect rows that appear in left or right | union                            | dplyr::union       | merge   |               |
| Collect rows that appear in both left and right | intersect                  | dplyr::intersect      | merge   |               |
| Collect rows that appear in left but not right | difference, setdiff      | dplyr::setdiff        | merge   |               |