doc/DataFrame_Comparison.md
# Comparison of DataFrames
Compare basic features of RedAmber with [Python pandas](https://pandas.pydata.org/),
[R Tidyverse](https://www.tidyverse.org/) and
[Julia DataFrames](https://dataframes.juliadata.org/stable/).
## Select columns (variables)
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|--- |--- |--- |--- |--- |
| Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
| Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
| Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
## Select rows (records, observations)
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|--- |--- |--- |--- |--- |
| Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
| Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
| Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
## Update columns / create new columns
|Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|--- |--- |--- |--- |--- |
| Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
| Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
| Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
| Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
| Sort dataframe | sort | dplyr::arrange | sort_values | sort |
## Reshape dataframe
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|--- |--- |--- |--- |--- |
| Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
| Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
| transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
## Grouping
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|--- |--- |--- |--- |--- |
|Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
## Combine dataframes or tables
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|--- |--- |--- |--- |--- |
| Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
| Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
| Join right to left, leaving only the matching rows| inner_join, join | dplyr::inner_join | merge | innerjoin |
| Join right to left, leaving all rows | full_join, outer_join, join | dplyr::full_join | merge | outerjoin |
| Join matching values to left from right | left_join, join | dplyr::left_join | merge | leftjoin |
| Join matching values from left to right | right_join, join | dplyr::right_join | merge | rightjoin |
| Return rows of left that have a match in right | semi_join, join | dplyr::semi_join | [isin] | semijoin |
| Return rows of left that do not have a match in right | anti_join, join | dplyr::anti_join | [isin] | antijoin |
| Collect rows that appear in left or right | union | dplyr::union | merge | |
| Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
| Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |