heronshoes/red_amber

View on GitHub
doc/DataFrame.md

Summary

Maintainability
Test Coverage
# DataFrame

Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
- A collection of data which have same data type within. We call it `Vector`.
- A label is attached to `Vector`. We call it `key`.
- A `Vector` and associated `key` is grouped as a `variable`.
- `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
  - Each `key` in a `DataFrame` must be unique.
- Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `record` or `observation`.

![dataframe model image](doc/../image/dataframe_model.png)

## Constructors and saving

### `new` from a Hash

  ```ruby
  df = RedAmber::DataFrame.new(x: [1, 2, 3], y: %w[A B C])
  ```

### `new` from a schema (by Hash) and data (by Array)

  ```ruby
  RedAmber::DataFrame.new({x: :uint8, y: :string}, [[1, "A"], [2, "B"], [3, "C"]])
  ```

### `new` from an Arrow::Table


  ```ruby
  table = Arrow::Table.new(x: [1, 2, 3], y: %w[A B C])
  RedAmber::DataFrame.new(table)
  ```

### `new` from an Object which responds to `to_arrow`

  ```ruby
  require "datasets-arrow"
  dataset = Datasets::Penguins.new
  RedAmber::DataFrame.new(dataset)
  ```

### `new` from a Rover::DataFrame


  ```ruby
  require 'rover'

  rover = Rover::DataFrame.new(x: [1, 2, 3], y: %w[A B C])
  RedAmber::DataFrame.new(rover)
  ```

### `load` (class method)

- from a `.arrow`, `.arrows`, `.csv`, `.csv.gz` or `.tsv` file
       
  ```ruby
  RedAmber::DataFrame.load("test/entity/with_header.csv")
  ```
  
  ```ruby
  RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
  ```

- from a string buffer

- from a URI

  ```ruby
  uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
  RedAmber::DataFrame.load(uri)
  ```

- from a Parquet file

  ```ruby
  require 'parquet'

  df = RedAmber::DataFrame.load("file.parquet")
  ```

### `save` (instance method)

- to a `.arrow`, `.arrows`, `.csv`, `.csv.gz` or `.tsv` file

- to a string buffer

- to a URI

- to a Parquet file

  ```ruby
  require 'parquet'

  df.save("file.parquet")
  ```

## Properties

### `table`, `to_arrow`

- Returns Arrow::Table object in the DataFrame.

### `size`, `n_records`, `n_obs`, `n_rows`
  
- Returns size of Vector (num of records).

### `n_keys`, `n_variables`, `n_vars`, `n_cols`,
  
- Returns num of keys (num of variables).
 
### `shape`
 
- Returns shape in an Array[n_rows, n_cols].

### `variables`

- Returns key names and Vectors pair in a Hash.

  It is convenient to use in a block when both key and vector required. We will write:

  ```ruby
    # update numeric variables
    df.assign do
      variables.select.with_object({}) do |(key, vector), assigner|
        assigner[key] = vector * -1 if vector.numeric?
      end
    end
  ```

  Instead of:
  ```ruby
    df.assign do
      assigner = {}
      vectors.each_with_index do |vector, i|
        assigner[keys[i]] = vector * -1 if vector.numeric?
      end
      assigner
    end
  ```

### `keys`, `var_names`, `column_names`
  
- Returns key names in an Array.

  Each key must be unique in the DataFrame.

### `types`
  
- Returns types of vectors in an Array of Symbols.

### `type_classes`

- Returns types of vector in an Array of `Arrow::DataType`.

### `vectors`

- Returns an Array of Vectors.

  When we use it, Vector#key is useful to get the key in the DataFrame.

  ```ruby
    # update numeric variables, another solution
    df.assign do
      vectors.each_with_object({}) do |vector, assigner|
        assigner[vector.key] = vector * -1 if vector.numeric?
      end
    end
  ```

### `indices`, `indexes`

- Returns indexes in a Vector.
  Accepts an option `start` as the first of indexes.

  ```ruby
  df = RedAmber::DataFrame.new(x: [1, 2, 3, 4, 5])
  df.indices

  # =>
  #<RedAmber::Vector(:uint8, size=5):0x0000000000013ed4>
  [0, 1, 2, 3, 4]

  df.indices(1)

  # =>
  #<RedAmber::Vector(:uint8, size=5):0x0000000000018fd8>
  [1, 2, 3, 4, 5]

  df.indices(:a)

  # =>
  #<RedAmber::Vector(:dictionary, size=5):0x000000000001bd50>
  [:a, :b, :c, :d, :e]
  ```

### `to_h`

- Returns column-oriented data in a Hash.

### `to_a`, `raw_records`

- Returns an array of row-oriented data without header.
  
  If you need a column-oriented full array, use `.to_h.to_a`

### `each_row`

  Yield each row in a `{ key => row}` Hash.
  Returns Enumerator if block is not given.

### `schema`

- Returns column name and data type in a Hash.

### `==`
 
### `empty?`

## Output

### `to_s`

`to_s` returns a preview of the Table.

```ruby
puts penguins.to_s

# =>
    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
    <string> <string>        <double>      <double>           <uint8> ... <uint16>
  0 Adelie   Torgersen           39.1          18.7               181 ...     2007
  1 Adelie   Torgersen           39.5          17.4               186 ...     2007
  2 Adelie   Torgersen           40.3          18.0               195 ...     2007
  3 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
  4 Adelie   Torgersen           36.7          19.3               193 ...     2007
  : :        :                      :             :                 : ...        :
341 Gentoo   Biscoe              50.4          15.7               222 ...     2009
342 Gentoo   Biscoe              45.2          14.8               212 ...     2009
343 Gentoo   Biscoe              49.9          16.1               213 ...     2009
```
### `inspect`

`inspect` uses `to_s` output and also shows shape and object_id.


### `summary`, `describe`

`DataFrame#summary` or `DataFrame#describe` shows summary statistics in a DataFrame.

```ruby
puts penguins.summary.to_s(width: 82) # needs more width to show all stats in this example

# =>
  variables            count     mean      std      min      25%   median      75%      max
  <dictionary>      <uint16> <double> <double> <double> <double> <double> <double> <double>
0 bill_length_mm         342    43.92     5.46     32.1    39.23    44.38     48.5     59.6
1 bill_depth_mm          342    17.15     1.97     13.1     15.6    17.32     18.7     21.5
2 flipper_length_mm      342   200.92    14.06    172.0    190.0    197.0    213.0    231.0
3 body_mass_g            342  4201.75   801.95   2700.0   3550.0   4031.5   4750.0   6300.0
4 year                   344  2008.03     0.82   2007.0   2007.0   2008.0   2009.0   2009.0
```

### `to_rover`

- Returns a `Rover::DataFrame`.

```ruby
require 'rover'

penguins.to_rover
```

### `to_iruby`

- Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby.

### `tdr(limit = 10, tally: 5, elements: 5)`

  - Shows some information about self in a transposed style.
  - `tdr_str` returns same info as a String.
  - `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.

  ```ruby
  require 'red_amber'
  require 'datasets-arrow'

  dataset = Datasets::Penguins.new
  # (From 0.2.2) responsible to the object which has `to_arrow` method.
  # If older, it should be `dataset.to_arrow` in the parentheses.
  RedAmber::DataFrame.new(dataset).tdr

  # =>
  RedAmber::DataFrame : 344 x 8 Vectors
  Vectors : 5 numeric, 3 strings
  # key                type   level data_preview
  0 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
  1 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
  2 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
  3 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
  4 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils
  5 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
  6 :sex               string     3 {"male"=>168, "female"=>165, nil=>11}
  7 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}
  ```
  
  Options:
  - limit: limit of variables to show. Default value is 10.
  - tally: max level to use tally mode. Default value is 5.
  - elements: max num of element to show values in each records. Default value is 5.

## Selecting

### Select variables (columns in a table) by `[]` as `[key]`, `[keys]`, `[keys[index]]`
- Key in a Symbol: `df[:symbol]`
- Key in a String: `df["string"]`
- Keys in an Array: `df[:symbol1, "string", :symbol2]`
- Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`

  Key indeces should be used via `keys[i]` because numbers are used to select records (rows). See next section.

- Keys by a Range:

  If keys are able to represent by a Range, it can be included in the arguments. See a example below.

- You can also exchange the order of variables (columns).
 
  ```ruby
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
  df = RedAmber::DataFrame.new(hash)
  df[:b..:c, "a"]

  # =>
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000328fc>
    b               c       a
    <string> <double> <uint8>
  0 A             1.0       1
  1 B             2.0       2
  2 C             3.0       3
  ```

  If `#[]` represents a single variable (column), it returns a Vector object.

  ```ruby
  df[:a]

  # =>
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
  [1, 2, 3]
  ```

  Or `#v` method also returns a Vector for a key.

  ```ruby
  df.v(:a)

  # =>
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
  [1, 2, 3]
  ```

  This method may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`

### Select records (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`

- Select a record by index: `df[0]`

- Select records by indeces in an Array: `df[1, 2]`

- Select records by indeces in a Range: `df[1..2]`

  An end-less or a begin-less Range can be used to represent indeces.

- You can use indices in Float.

- Mixed case: `df[2, 0..]`

  ```ruby
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
  df = RedAmber::DataFrame.new(hash)
  df[2, 0..]

  # =>
  #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000033270>
          a b               c
    <uint8> <string> <double>
  0       3 C             3.0
  1       1 A             1.0
  2       2 B             2.0
  3       3 C             3.0
  ```

- Select records by a boolean Array or a boolean RedAmber::Vector at same size as self.

  It returns a sub dataframe with records at boolean is true.

    ```ruby
    # with the same dataframe `df` above
    df[true, false, nil] # or
    df[[true, false, nil]] # or
    df[RedAmber::Vector.new([true, false, nil])]

    # =>
    #<RedAmber::DataFrame : 1 x 3 Vectors, 0x00000000000353e0>
            a b               c
      <uint8> <string> <double>
    1       1 A             1.0
    ```

### Select records (rows) from top or from bottom

  `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`

## Sub DataFrame manipulations

### `pick  ` - pick up variables -

  Pick up some variables (columns) to create a sub DataFrame.

  ![pick method image](doc/../image/dataframe/pick.png)

- Keys as arguments

  `pick(keys)` accepts keys as arguments in an Array or a Range.

    ```ruby
    penguins.pick(:species, :bill_length_mm)

    # =>
    #<RedAmber::DataFrame : 344 x 2 Vectors, 0x0000000000035ebc>
        species  bill_length_mm
        <string>       <double>
      0 Adelie             39.1
      1 Adelie             39.5
      2 Adelie             40.3
      3 Adelie            (nil)
      4 Adelie             36.7
      : :                     :
    341 Gentoo             50.4
    342 Gentoo             45.2
    343 Gentoo             49.9
    ```

- Indices as arguments

  `pick(indices)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers.

    ```ruby
    penguins.pick(0..2, -1)
    
    # =>
    #<RedAmber::DataFrame : 344 x 4 Vectors, 0x0000000000055ce4>
        species  island    bill_length_mm     year
        <string> <string>        <double> <uint16>
      0 Adelie   Torgersen           39.1     2007
      1 Adelie   Torgersen           39.5     2007
      2 Adelie   Torgersen           40.3     2007
      3 Adelie   Torgersen          (nil)     2007
      4 Adelie   Torgersen           36.7     2007
      : :        :                      :        :
    341 Gentoo   Biscoe              50.4     2009
    342 Gentoo   Biscoe              45.2     2009
    343 Gentoo   Biscoe              49.9     2009
    ```

- Booleans as arguments

  `pick(booleans)` accepts booleans as arguments in an Array. Booleans must be same length as `n_keys`.

    ```ruby
    penguins.pick(penguins.vectors.map(&:string?))
    
    # =>
    #<RedAmber::DataFrame : 344 x 3 Vectors, 0x00000000000387ac>
        species  island    sex
        <string> <string>  <string>
      0 Adelie   Torgersen male
      1 Adelie   Torgersen female
      2 Adelie   Torgersen female
      3 Adelie   Torgersen (nil)
      4 Adelie   Torgersen female
      : :        :         :
    341 Gentoo   Biscoe    male
    342 Gentoo   Biscoe    female
    343 Gentoo   Biscoe    male
    ```

- Keys or booleans by a block

    `pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, indices or a boolean Array with a same length as `n_keys`. Block is called in the context of self.

    ```ruby
    penguins.pick { keys.map { |key| key.end_with?('mm') } }

    # =>
    #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003dd4c>
        bill_length_mm bill_depth_mm flipper_length_mm
              <double>      <double>           <uint8>
      0           39.1          18.7               181
      1           39.5          17.4               186
      2           40.3          18.0               195
      3          (nil)         (nil)             (nil)
      4           36.7          19.3               193
      :              :             :                 :
    341           50.4          15.7               222
    342           45.2          14.8               212
    343           49.9          16.1               213
    ```

### `drop  ` - counterpart of pick -

  Drop some variables (columns) to create a remainer DataFrame.

  ![drop method image](doc/../image/dataframe/drop.png)

- Keys as arguments

  `drop(keys)` accepts keys as arguments in an Array or a Range.

- Indices as arguments

  `drop(indices)` accepts indices as a arguments. Indices should be Integers, Floats or Ranges of Integers.

- Booleans as arguments

  `drop(booleans)` accepts booleans as an argument in an Array. Booleans must be same length as `n_keys`.

- Keys or booleans by a block

  `drop {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, indices or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
  
- Notice for nil

  When used with booleans, nil in booleans is treated as a false. This behavior is aligned with Ruby's `nil#!`.

  ```ruby
  booleans = [true, false, nil]
  booleans_invert = booleans.map(&:!) # => [false, true, true]
  df.pick(booleans) == df.drop(booleans_invert) # => true
  ```
- Difference between `pick`/`drop` and `[]`

  If `pick` or `drop` will select a single variable (column), it returns a `DataFrame` with one variable. In contrast, `[]` returns a `Vector`. This behavior may be useful to use in a block of DataFrame manipulations.

  ```ruby
  df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
  df.pick(:a) # or
  df.drop(:b, :c)

  # =>
  #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000003f4bc>
          a
    <uint8>
  0       1
  1       2
  2       3

  df[:a]

  # =>
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
  [1, 2, 3]
  ```

  A simple key name is usable as a method of the DataFrame if the key name is acceptable as a method name.
  It returns a Vector same as `[]`.

  ```ruby
  df.a

  # =>
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
  [1, 2, 3]
  ```

### `slice  `  - cut into slices of records -

  Slice and select records (rows) to create a sub DataFrame.

  ![slice method image](doc/../image/dataframe/slice.png)

- Indices as arguments

    `slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers.

    Negative index from the tail like Ruby's Array is also acceptable.

    ```ruby
    # returns 5 records at start and 5 records from end
    penguins.slice(0...5, -5..-1)

    # =>
    #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
      species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
      <string> <string>        <double>      <double>           <uint8> ... <uint16>
    0 Adelie   Torgersen           39.1          18.7               181 ...     2007
    1 Adelie   Torgersen           39.5          17.4               186 ...     2007
    2 Adelie   Torgersen           40.3          18.0               195 ...     2007
    3 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
    4 Adelie   Torgersen           36.7          19.3               193 ...     2007
    : :        :                      :             :                 : ...        :
    7 Gentoo   Biscoe              50.4          15.7               222 ...     2009
    8 Gentoo   Biscoe              45.2          14.8               212 ...     2009
    9 Gentoo   Biscoe              49.9          16.1               213 ...     2009
    ```

- Booleans as an argument

  `filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.

  note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.

    ```ruby
    vector = penguins[:bill_length_mm]
    penguins.filter(vector >= 40)
    # penguins.slice(vector >= 40) is also acceptable

    # =>
    #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
        species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
        <string> <string>        <double>      <double>           <uint8> ... <uint16>
      0 Adelie   Torgersen           40.3          18.0               195 ...     2007
      1 Adelie   Torgersen           42.0          20.2               190 ...     2007
      2 Adelie   Torgersen           41.1          17.6               182 ...     2007
      3 Adelie   Torgersen           42.5          20.7               197 ...     2007
      4 Adelie   Torgersen           46.0          21.5               194 ...     2007
      : :        :                      :             :                 : ...        :
    239 Gentoo   Biscoe              50.4          15.7               222 ...     2009
    240 Gentoo   Biscoe              45.2          14.8               212 ...     2009
    241 Gentoo   Biscoe              49.9          16.1               213 ...     2009
    ```

- Indices or booleans by a block

    `slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.

    ```ruby
    # return a DataFrame with bill_length_mm is in 2*std range around mean
    penguins.slice do
      vector = self[:bill_length_mm]
      min = vector.mean - vector.std
      max = vector.mean + vector.std
      vector.to_a.map { |e| (min..max).include? e }
    end

    # =>
    #<RedAmber::DataFrame : 204 x 8 Vectors, 0x0000000000047a40>
        species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
        <string> <string>        <double>      <double>           <uint8> ... <uint16>
      0 Adelie   Torgersen           39.1          18.7               181 ...     2007
      1 Adelie   Torgersen           39.5          17.4               186 ...     2007
      2 Adelie   Torgersen           40.3          18.0               195 ...     2007
      3 Adelie   Torgersen           39.3          20.6               190 ...     2007
      4 Adelie   Torgersen           38.9          17.8               181 ...     2007
      : :        :                      :             :                 : ...        :
    201 Gentoo   Biscoe              47.2          13.7               214 ...     2009
    202 Gentoo   Biscoe              46.8          14.3               215 ...     2009
    203 Gentoo   Biscoe              45.2          14.8               212 ...     2009
    ```

- Notice: nil option
  - `Arrow::Table#slice` uses `filter` method with a option `Arrow::FilterOptions.null_selection_behavior = :emit_null`. This will propagate nil at the same row.
    
    ```ruby
    hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
    table = Arrow::Table.new(hash)
    table.slice([true, false, nil])

    # =>
    #<Arrow::Table:0x7fdfe44b9e18 ptr=0x555e9fe744d0>
             a    b                c
    0         1  A      1.000000
    1    (null)    (null)   (null)
    ```

  - Whereas in RedAmber, `DataFrame#slice` with booleans containing nil is treated as false. This behavior comes from `Allow::FilterOptions.null_selection_behavior = :drop`. This is  a default value for `Arrow::Table.filter` method.

    ```ruby
    RedAmber::DataFrame.new(table).slice([true, false, nil]).table

    # =>
    #<Arrow::Table:0x7fdfe44981c8 ptr=0x555e9febc330>
        a    b             c
    0    1    A      1.000000
    ``` 

### `remove` - counterpart of slice -

  Slice and reject records (rows) to create a remainer DataFrame.

  ![remove method image](doc/../image/dataframe/remove.png)

- Indices as arguments

    `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.

    ```ruby
    # returns 6th to 339th records
    penguins.remove(0...5, -5..-1)

    # =>
    #<RedAmber::DataFrame : 334 x 8 Vectors, 0x00000000000487c4>
        species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
        <string> <string>        <double>      <double>           <uint8> ... <uint16>
      0 Adelie   Torgersen           39.3          20.6               190 ...     2007
      1 Adelie   Torgersen           38.9          17.8               181 ...     2007
      2 Adelie   Torgersen           39.2          19.6               195 ...     2007
      3 Adelie   Torgersen           34.1          18.1               193 ...     2007
      4 Adelie   Torgersen           42.0          20.2               190 ...     2007
      : :        :                      :             :                 : ...        :
    331 Gentoo   Biscoe              44.5          15.7               217 ...     2009
    332 Gentoo   Biscoe              48.8          16.2               222 ...     2009
    333 Gentoo   Biscoe              47.2          13.7               214 ...     2009
    ```

- Booleans as an argument

  `remove(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.

    ```ruby
    # remove all records contains nil
    removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
    removed

    # =>
    #<RedAmber::DataFrame : 333 x 8 Vectors, 0x0000000000049fac>
        species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
        <string> <string>        <double>      <double>           <uint8> ... <uint16>
      0 Adelie   Torgersen           39.1          18.7               181 ...     2007
      1 Adelie   Torgersen           39.5          17.4               186 ...     2007
      2 Adelie   Torgersen           40.3          18.0               195 ...     2007
      3 Adelie   Torgersen           36.7          19.3               193 ...     2007
      4 Adelie   Torgersen           39.3          20.6               190 ...     2007
      : :        :                      :             :                 : ...        :
    330 Gentoo   Biscoe              50.4          15.7               222 ...     2009
    331 Gentoo   Biscoe              45.2          14.8               212 ...     2009
    332 Gentoo   Biscoe              49.9          16.1               213 ...     2009
    ```

- Indices or booleans by a block

    `remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.

    ```ruby
    penguins.remove do
      # We will use another style shown in slice
      # self.bill_length_mm returns Vector
      mean = bill_length_mm.mean
      min = mean - bill_length_mm.std
      max = mean + bill_length_mm.std
      bill_length_mm.to_a.map { |e| (min..max).include? e }
    end

    # =>
    #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000004de40>
        species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
        <string> <string>        <double>      <double>           <uint8> ... <uint16>
      0 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
      1 Adelie   Torgersen           36.7          19.3               193 ...     2007
      2 Adelie   Torgersen           34.1          18.1               193 ...     2007
      3 Adelie   Torgersen           37.8          17.1               186 ...     2007
      4 Adelie   Torgersen           37.8          17.3               180 ...     2007
      : :        :                      :             :                 : ...        :
    137 Gentoo   Biscoe             (nil)         (nil)             (nil) ...     2009
    138 Gentoo   Biscoe              50.4          15.7               222 ...     2009
    139 Gentoo   Biscoe              49.9          16.1               213 ...     2009
    ```

- Notice for nil
  - When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.

    ```ruby
    df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])
    booleans = df[:a] < 2
    booleans

    # =>
    #<RedAmber::Vector(:boolean, size=3):0x000000000000f410>
    [true, false, nil]

    booleans_invert = booleans.to_a.map(&:!) # => [false, true, true]
    
    df.slice(booleans) == df.remove(booleans_invert) # => true
    ```

  - Whereas `Vector#invert` returns nil for elements nil. This will bring different result.

    ```ruby
    booleans.invert

    # =>
    #<RedAmber::Vector(:boolean, size=3):0x000000000000f488>
    [false, true, nil]

    df.remove(booleans.invert)

    # =>
    #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000005df98>
            a b               c
      <uint8> <string> <double>
    0       1 A             1.0
    1   (nil) C             3.0
    ```

### `rename`

  Rename keys (variable/column names) to create a updated DataFrame.

  ![rename method image](doc/../image/dataframe/rename.png)

- Key pairs as arguments

    `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`.

    ```ruby
    df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
    df.rename(:age => :age_in_1993)

    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000060838>
      name     age_in_1993
      <string>     <uint8>
    0 Yasuko            68
    1 Rui               49
    2 Hinata            28
    ```

- Key pairs by a block

    `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`. Block is called in the context of self.

- Not existing keys

    If specified `existing_key` is not exist, raise a `DataFrameArgumentError`.

- Key type

  Symbol key and String key are distinguished.

### `assign`

  Assign new or updated variables (columns) and create an updated DataFrame.

  - Variables with new keys will append new columns from right.
  - Variables with exisiting keys will update corresponding vectors.

    ![assign method image](doc/../image/dataframe/assign.png)

- Variables as arguments

    `assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.

    ```ruby
    df = RedAmber::DataFrame.new(
      name: %w[Yasuko Rui Hinata],
      age: [68, 49, 28])
    df
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000062804>
      name         age
      <string> <uint8>
    0 Yasuko        68
    1 Rui           49
    2 Hinata        28

    # update :age and add :brother
    df.assign(
      {
        age: age + 29,
        brother: ['Santa', nil, 'Momotaro']
      }
    )

    # =>
    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
      name         age brother
      <string> <uint8> <string>
    0 Yasuko        97 Santa
    1 Rui           78 (nil)
    2 Hinata        57 Momotaro
    ```

- Key pairs by a block

    `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`. The block is called in the context of self.

    ```ruby
    df = RedAmber::DataFrame.new(
      index: [0, 1, 2, 3, nil],
      float: [0.0, 1.1,  2.2, Float::NAN, nil],
      string: ['A', 'B', 'C', 'D', nil]
    )
    df

    # =>
    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
        index    float string
      <uint8> <double> <string>
    0       0      0.0 A
    1       1      1.1 B
    2       2      2.2 C
    3       3      NaN D
    4   (nil)    (nil) (nil)

    # update :float
    # assigner by an Array
    df.assign do
      vectors.select(&:float?)
             .map { |v| [v.key, -v] }
    end

    # =>
    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
        index    float string
      <uint8> <double> <string>
    0       0     -0.0 A
    1       1     -1.1 B
    2       2     -2.2 C
    3       3      NaN D
    4   (nil)    (nil) (nil)

    # Or we can use assigner by a Hash
    df.assign do
      vectors.select.with_object({}) do |v, assigner|
        assigner[v.key] = -v if v.float?
      end
    end

    # => same as above
    ```

- Key type

  Symbol key and String key are considered as the same key.

- Empty assignment
  
  If assigner is empty or nil, returns self.

- Append from left

  `assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.

  ```ruby
  df.assign_left(new_index: df.indices(1))
  
  # => 
  #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
    new_index   index    float string
      <uint8> <uint8> <double> <string>
  0         1       0      0.0 A
  1         2       1      1.1 B
  2         3       2      2.2 C
  3         4       3      NaN D
  4         5   (nil)    (nil) (nil)
  ```

### `slice_by(key, keep_key: false) { block }`

`slice_by` accepts a key and a block to select rows.

(Since 0.2.1)

  ```ruby
  df = RedAmber::DataFrame.new(
    index: [0, 1, 2, 3, nil],
    float: [0.0, 1.1,  2.2, Float::NAN, nil],
    string: ['A', 'B', 'C', 'D', nil]
  )
  df

  # =>
  #<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
      index    float string
    <uint8> <double> <string>
  0       0      0.0 A
  1       1      1.1 B
  2       2      2.2 C
  3       3      NaN D
  4   (nil)    (nil) (nil)

  df.slice_by(:string) { ["A", "C"] }

  # =>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000001b1ac>
      index    float
    <uint8> <double>
  0       0      0.0
  1       2      2.2
  ```

It is the same behavior as;

  ```ruby
  df.slice { [string.index("A"), string.index("C")] }.drop(:string)
  ```

`slice_by` also accepts a Range.

  ```ruby
  df.slice_by(:string) { "A".."C" }

  # =>
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000069668>
      index    float
    <uint8> <double>
  0       0      0.0
  1       1      1.1
  2       2      2.2
  ```

When the option `keep_key: true` used, the column `key` will be preserved.

  ```ruby
  df.slice_by(:string, keep_key: true) { "A".."C" }

  # =>
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000073c44>
      index    float string
    <uint8> <double> <string>
  0       0      0.0 A
  1       1      1.1 B
  2       2      2.2 C
  ```

## Updating

### `sort`

  `sort` accepts parameters as sort_keys thanks to the Red Arrow's feature。
    - :key, "key" or "+key" denotes ascending order
    - "-key" denotes descending order

  ```ruby
  df = RedAmber::DataFrame.new(
        index:  [1, 1, 0, nil, 0],
        string: ['C', 'B', nil, 'A', 'B'],
        bool:   [nil, true, false, true, false],
      )
  df.sort(:index, '-bool')
  
  # =>
  #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000009b03c>
      index string   bool
    <uint8> <string> <boolean>
  0       0 (nil)    false
  1       0 B        false
  2       1 B        true
  3       1 C        (nil)
  4   (nil) A        true
  ```

- [ ] Clamp

- [ ] Clear data

## Treat na data

### `remove_nil`

  Remove any records containing nil.

## Grouping

### `group(group_keys)`

  `group` creates a instance of class `Group`. `Group` accepts functions below as a method.
  Method accepts options as `group_keys`.

  Available functions are:

  - [ ] all                 
  - [ ] any
  - [ ] approximate_median
  - ✓ count
  - [ ] count_distinct
  - [ ] distinct
  - ✓ max
  - ✓ mean
  - ✓ min
  - [ ] min_max
  - ✓ product
  - ✓ stddev
  - ✓ sum
  - [ ] tdigest
  - ✓ variance

  For the each group of `group_keys`, the aggregation `function` is applied and returns a new dataframe with aggregated keys according to `summary_keys`.
  Summary key names are provided by `function(summary_keys)` style.

  This is an example of grouping of famous STARWARS dataset.

  ```ruby
  uri = URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv")
  starwars = RedAmber::DataFrame.load(uri)
  
  # =>
  #<RedAmber::DataFrame : 87 x 12 Vectors, 0x0000000000005a50>
     unnamed1 name            height     mass hair_color skin_color  eye_color ... species
      <int64> <string>       <int64> <double> <string>   <string>    <string>  ... <string>
   0        1 Luke Skywalker     172     77.0 blond      fair        blue      ... Human
   1        2 C-3PO              167     75.0 NA         gold        yellow    ... Droid
   2        3 R2-D2               96     32.0 NA         white, blue red       ... Droid
   3        4 Darth Vader        202    136.0 none       white       yellow    ... Human
   4        5 Leia Organa        150     49.0 brown      light       brown     ... Human
   :        : :                    :        : :          :           :         ... :
  84       85 BB8              (nil)    (nil) none       none        black     ... Droid
  85       86 Captain Phasma   (nil)    (nil) unknown    unknown     unknown   ... NA
  86       87 Padmé Amidala      165     45.0 brown      light       brown     ... Human

  starwars.tdr(12)

  # =>
  RedAmber::DataFrame : 87 x 12 Vectors
  Vectors : 4 numeric, 8 strings
  #  key         type   level data_preview
  0  :unnamed1   int64     87 [1, 2, 3, 4, 5, ... ]
  1  :name       string    87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
  2  :height     int64     46 [172, 167, 96, 202, 150, ... ], 6 nils
  3  :mass       double    39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
  4  :hair_color string    13 ["blond", "NA", "NA", "none", "brown", ... ]
  5  :skin_color string    31 ["fair", "gold", "white, blue", "white", "light", ... ]
  6  :eye_color  string    15 ["blue", "yellow", "red", "yellow", "brown", ... ]
  7  :birth_year double    37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
  8  :sex        string     5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, "NA"=>4}
  9  :gender     string     3 {"masculine"=>66, "feminine"=>17, "NA"=>4}
  10 :homeworld  string    49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ]
  11 :species    string    38 ["Human", "Droid", "Droid", "Human", "Human", ... ]
  ```

  We can group by `:species` and calculate the count.

  ```ruby
  starwars.remove { species == "NA" }
          .group(:species).count(:species)

  # =>
  #<RedAmber::DataFrame : 37 x 2 Vectors, 0x000000000000ffa0>
     species    count
     <string> <int64>
   0 Human         35
   1 Droid          6
   2 Wookiee        2
   3 Rodian         1
   4 Hutt           1
   : :              :
  34 Kaleesh        1
  35 Pau'an         1
  36 Kel Dor        1
  ```

  We can also calculate the mean of `:mass` and `:height` together.

  ```ruby
  grouped = starwars.remove { species == "NA" }
                    .group(:species) { [count(:species), mean(:height, :mass)] }

  # =>
  #<RedAmber::DataFrame : 37 x 4 Vectors, 0x000000000000fff0>
     species    count mean(height) mean(mass)
     <string> <int64>     <double>   <double>
   0 Human         35       176.65      82.78
   1 Droid          6        131.2      69.75
   2 Wookiee        2        231.0      124.0
   3 Rodian         1        173.0       74.0
   4 Hutt           1        175.0     1358.0
   : :              :            :          :
  34 Kaleesh        1        216.0      159.0
  35 Pau'an         1        206.0       80.0
  36 Kel Dor        1        188.0       80.0
  ```

  Select rows for count > 1.
  
  ```ruby
  grouped.slice(grouped[:count] > 1)

  # =>
  #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000001002c>
    species    count mean(height) mean(mass)
    <string> <int64>     <double>   <double>
  0 Human         35       176.65      82.78
  1 Droid          6        131.2      69.75
  2 Wookiee        2        231.0      124.0
  3 Gungan         3       208.67       74.0
  4 Zabrak         2        173.0       80.0
  5 Twi'lek        2        179.0       55.0
  6 Mirialan       2        168.0       53.1
  7 Kaminoan       2        221.0       88.0
  ```

## Reshape

![dataframe reshapeing image](doc/../image/reshaping_dataframe.png)

### `transpose`

  Creates transposed DataFrame for the wide (messy) dataframe.

  ```ruby
  import_cars = RedAmber::DataFrame.load('test/entity/import_cars.tsv')

  # =>
  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
       Year    Audi     BMW BMW_MINI Mercedes-Benz      VW
    <int64> <int64> <int64>  <int64>       <int64> <int64>
  0    2017   28336   52527    25427         68221   49040
  1    2018   26473   50982    25984         67554   51961
  2    2019   24222   46814    23813         66553   46794
  3    2020   22304   35712    20196         57041   36576
  4    2021   22535   35905    18211         51722   35215

  import_cars.transpose(name: :Manufacturer)

  # =>
  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x0000000000010a2c>
    Manufacturer      2017     2018     2019     2020     2021
    <string>      <uint32> <uint32> <uint32> <uint16> <uint16>
  0 Audi             28336    26473    24222    22304    22535
  1 BMW              52527    50982    46814    35712    35905
  2 BMW_MINI         25427    25984    23813    20196    18211
  3 Mercedes-Benz    68221    67554    66553    57041    51722
  4 VW               49040    51961    46794    36576    35215
  ```
  
  The leftmost column is created by original keys. Key name of the column is
  named by parameter `:name`. If `:name` is not specified, `:NAME` is used for the key.

### `to_long(*keep_keys)`

  Creates a 'long' (may be tidy) DataFrame from a 'wide' DataFrame.

  - Parameter `keep_keys` specifies the key names to keep.

  ```ruby
  import_cars.to_long(:Year)

  # =>
  #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000011864>
         Year NAME             VALUE
     <uint16> <string>      <uint32>
   0     2017 Audi             28336
   1     2017 BMW              52527
   2     2017 BMW_MINI         25427
   3     2017 Mercedes-Benz    68221
   4     2017 VW               49040
   :        : :                    :
  22     2021 BMW_MINI         18211
  23     2021 Mercedes-Benz    51722
  24     2021 VW               35215
  ```

  - Option `:name` is the key of the column which came **from key names**.
    The default value is `:NAME` if it is not specified.
  - Option `:value` is the key of the column which came **from values**.
    The default value is `:VALUE` if it is not specified.

  ```ruby
  import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)

  # =>
  #<RedAmber::DataFrame : 25 x 3 Vectors, 0x000000000001359c>
         Year Manufacturer  Num_of_imported
     <uint16> <string>             <uint32>
   0     2017 Audi                    28336
   1     2017 BMW                     52527
   2     2017 BMW_MINI                25427
   3     2017 Mercedes-Benz           68221
   4     2017 VW                      49040
   :        : :                           :
  22     2021 BMW_MINI                18211
  23     2021 Mercedes-Benz           51722
  24     2021 VW                      35215
  ```

### `to_wide`

  Creates a 'wide' (may be messy) DataFrame from a 'long' DataFrame.

  - Option `:name` is the key of the column which will be expanded **to key names**.
    The default value is `:NAME` if it is not specified.
  - Option `:value` is the key of the column which will be expanded **to values**.
    The default value is `:VALUE` if it is not specified.

  ```ruby
  import_cars.to_long(:Year).to_wide
  # import_cars.to_long(:Year).to_wide(name: :N, value: :V)
  # is also OK

  # =>
  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
        Year     Audi      BMW BMW_MINI Mercedes-Benz       VW
    <uint16> <uint16> <uint16> <uint16>      <uint32> <uint16>
  0     2017    28336    52527    25427         68221    49040
  1     2018    26473    50982    25984         67554    51961
  2     2019    24222    46814    23813         66553    46794
  3     2020    22304    35712    20196         57041    36576
  4     2021    22535    35905    18211         51722    35215
  ```

## Combine

### `join`
![dataframe joining image](doc/../image/dataframe/join.png)

  You should use specific `*_join` methods below.

  - `other` is a DataFrame or a Arrow::Table.
  - `join_keys` are keys shared by self and other to match with them.
  - If `join_keys` are empty, common keys in self and other are chosen (natural join).
  - If (common keys) > `join_keys`, duplicated keys are renamed by `suffix`. 
  - If you want to match the columns with different names,
    use Hash for `join_keys` such as `{ left: :KEY1, right: KEY2}`.

  These are dataframes to use in the examples of joins.
  ```ruby
  df = DataFrame.new(
    KEY: %w[A B C],
    X1: [1, 2, 3]
  )
  #=>
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000012a70>
    KEY           X1
    <string> <uint8>
  0 A              1
  1 B              2
  2 C              3

  other = DataFrame.new(
    KEY: %w[A B D],
    X2: [true, false, nil]
  )
  #=>
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000017034>
    KEY      X2
    <string> <boolean>
  0 A        true
  1 B        false
  2 D        (nil)
  ```

#### Mutating joins

##### `inner_join(other, join_keys = nil, suffix: '.1')`

  Join data, leaving only the matching records.

  ```ruby
  df.inner_join(other, :KEY)
  #=>
  #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000001e2bc>     
    KEY           X1 X2
    <string> <uint8> <boolean>
  0 A              1 true
  1 B              2 false
  ```

##### `full_join(other, join_keys = nil, suffix: '.1')`

  Join data, leaving all records.

  ```ruby
  df.full_join(other, :KEY)
  #=>
  #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000029fcc>
    KEY           X1 X2
    <string> <uint8> <boolean>
  0 A              1 true
  1 B              2 false
  2 C              3 (nil)
  3 D          (nil) (nil)
  ```

##### `left_join(other, join_keys = nil, suffix: '.1')`

  Join matching values to self from other.

  ```ruby
  df.left_join(other, :KEY)
  #=>
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000029fcc>
    KEY           X1 X2
    <string> <uint8> <boolean>
  0 A              1 true
  1 B              2 false
  2 C              3 (nil)
  ```

##### `right_join(other, join_keys = nil, suffix: '.1')`

  Join matching values from self to other.

  ```ruby
  df.right_join(other, :KEY)
  #=>
  #<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000029fcc>
    KEY           X1 X2
    <string> <uint8> <boolean>
  0 A              1 true
  1 B              2 false
  2 D          (nil) (nil)
  ```

#### Filtering join

##### `semi_join(other, join_keys = nil, suffix: '.1')`

  Return records of self that have a match in other.

  ```ruby
  df.semi_join(other, :KEY)
  #=>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000029fcc>
    KEY           X1
    <string> <uint8>
  0 A              1
  1 B              2
  ```

##### `anti_join(other, join_keys = nil, suffix: '.1')`

  Return records of self that do not have a match in other.

  ```ruby
  df.anti_join(other, :KEY)
  #=>
  #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
    KEY           X1
    <string> <uint8>
  0 C              3
  ```

## Set operations
![dataframe set and binding image](doc/../image/dataframe/set_and_bind.png)

  Keys in self and other must be same in set operations.

  ```ruby
  df = DataFrame.new(
    KEY1: %w[A B C],
    KEY2: [1, 2, 3]
  )
  #=>
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000012a70>
    KEY1        KEY2
    <string> <uint8>
  0 A              1
  1 B              2
  2 C              3

  other = DataFrame.new(
    KEY1: %w[A B D],
    KEY2: [1, 4, 5]
  )
  #=>
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000017034>
    KEY1        KEY2
    <string> <uint8>
  0 A              1
  1 B              4
  2 D              5
  ```
##### `set_operable?(other)`
  Check if `types` of self and other are same.

##### `intersect(other)`

  Select records appearing in both self and other.

  ```ruby
  df.intersect(other)
  #=>
  #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
    KEY1        KEY2
    <string> <uint8>
  0 A              1
  ```

##### `union(other)`

  Select records appearing in self or other.

  ```ruby
  df.union(other)
  #=>
  #<RedAmber::DataFrame : 5 x 2 Vectors, 0x0000000000029fcc>
    KEY1        KEY2
    <string> <uint8>
  0 A              1
  1 B              2
  2 C              3
  3 B              4
  4 D              5
  ```

##### `difference(other)`

  Select records appearing in self but not in other.

  It has an alias `setdiff`.

  ```ruby
  df.difference(other)
  #=>
  #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
    KEY1        KEY2
    <string> <uint8>
  1 B              2
  2 C              3

  other.differencr(df)
  #=>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
    KEY1        KEY2                                    
    <string> <uint8>                                    
  0 B              4                      
  1 D              5
  ```

## Binding

### `concatenate(other)`

  Concatenate another DataFrame or Table onto the bottom of self. The types  of other must be the same as self.

  The alias is `concat` and `bind_rows`.

  An array of DataFrames or Tables is also acceptable as other.

  ```ruby
  df
  #=>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000022cb8>
          x y
    <uint8> <string>
  0       1 A
  1       2 B
  
  other
  #=>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000001f6d0>
          x y
    <uint8> <string>
  0       3 C
  1       4 D

  df.concatenate(other)
  #=>
  #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000022574>
          x y
    <uint8> <string>
  0       1 A
  1       2 B
  2       3 C
  3       4 D
  ```

### `merge(*other)`

  Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.

  The alias is `bind_cols`.

  ```ruby
  df
  #=>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000009150>
          x       y
    <uint8> <uint8>
  0       1       3
  1       2       4

  other
  #=>
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000008a0c>
    a        b
    <string> <string>
  0 A        C
  1 B        D

  df.merge(other)
  #=>
  #<RedAmber::DataFrame : 2 x 4 Vectors, 0x000000000000cb70>
          x       y a        b
    <uint8> <uint8> <string> <string>
  0       1       3 A        C
  1       2       4 B        D
  ```

## Encoding

- [ ] One-hot encoding